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NOVEL METHODS OF CONSTRUCTING LIBRARIES 

COMPRISING DISPLAYED AND/OR EXPRESSED 
MEMBERS OF A DIVERSE FAMILY OF PEPTIDES, 
POLYPEPTIDES OR PROTEINS AND THE NOVEL LIBRARIES 

This application is a continuation-in-part of 
United States provisional application 06/198,069, filed 
April 17, 2000, a continuation-in-part of United States 
patent application 09/837,306, filed on April 17, 2001, 
and a continuation-in-part of United States application 
XX/XXX,XXX, filed by Express Mail (EI125454535US) on 
October 25, 2001. All of the earlier applications are 
specifically incorporated by reference herein. 

The present invention relates to libraries of 
genetic packages that display and/or express a member 
of a diverse family of peptides, polypeptides or 
proteins and collectively display and/or express at 
least a portion of the diversity of the family. In an 
alternative embodiment, the invention relates to 
libraries that include a member of a diverse family of 
peptides, polypeptides or proteins and collectively 
comprise at least a portion of the diversity of the 
family. In a preferred embodiment, the displayed 
and/or expressed polypeptides are human Fabs. 

More specifically, the invention is directed 
to the methods of cleaving single-stranded nucleic 
acids at chosen locations, the cleaved nucleic acids 
encoding, at least in part, the peptides, polypeptides 
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or proteins displayed on the genetic packages of, 
and/or expressed in, the libraries of the invention. 
In a preferred embodiment/ the genetic packages are 
filamentous phage or phagemids or yeast. 

The present invention further relates to 
vectors for displaying and/or expressing a diverse 
family of peptides, polypeptides or proteins. 

The present invention further relates to 
methods of screening the libraries of the invention and 
to the peptides, polypeptides and proteins identified 
by such screening. 

BACKGROUND OF THE INVENTION 

It is now common practice in the art to 
prepare libraries of genetic packages that display, 
express or comprise a member of a diverse family of 
peptides, polypeptides or proteins and collectively 
display, express or comprise at least a portion of the 
diversity of the family. In many common libraries, the 
peptides, polypeptides or proteins are related to 
antibodies. Often, they are Fabs or single chain 
antibodies . 

In general, the DNAs that encode members of 
the families to be displayed and/or expressed must be 
amplified before they are cloned and used to display 
and/or express the desired member. Such amplification 
typically makes use of forward and backward primers. 

Such primers can be complementary to 
sequences native to the DNA to be amplified or 
complementary to oligonucleotides attached at the 5 f or 
3 f ends of that DNA. Primers that are complementary to 
sequences native to the DNA to be amplified are 
disadvantaged in that they bias the members of the 
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families to be displayed. Only those members that 
contain a sequence in the native DNA that is 
substantially complementary to the primer will be 
amplified. Those that do not will be absent from the 
5 family. For those members that are amplified, any 

diversity within the primer region will be suppressed. 

For example, in European patent 368,684 Bl, 
the primer that is used is at the 5 T end of the V H 
region of an antibody gene. It anneals to a sequence 
u 10 region in the native DNA that is said to be 

CJj "sufficiently well conserved" within a single species. 

*2 ! Such primer will bias the members amplified to those . 

4*! 

§Jh having this "conserved" region. Any diversity within 

ffi this region is extinguished. 

f ;; 15 It is generally accepted that human antibody 

s genes arise through a process that involves a 

^ : combinatorial selection of V and J or V, D, and J 

m followed by somatic mutations. Although most diversity 

U1 occurs in the Complementary Determining Regions (CDRs), 

F? 

20 diversity also occurs in the more conserved Framework 

N ; 

Regions (FRs) and at least some of this diversity 
confers or enhances specific binding to antigens (Ag) . 
As a consequence, libraries should contain as much of 
the CDR and FR diversity as possible. 

25 To clone the amplified DNAs of the peptides, 

polypeptides or proteins that they encode for display 
on a genetic package and/or for expression, the DNAs 
must be cleaved to produce appropriate ends for 
ligation to a vector. Such cleavage is generally 

30 effected using restriction endonuclease recognition 

sites carried on the primers. When the primers are at 
the 5 1 end of DNA produced from reverse transcription 
of RNA, such restriction leaves deleterious 5 f 
untranslated regions in the amplified DNA. These 



regions interfere with expression of the cloned genes 
and thus the display of the peptides, polypeptides and 
proteins coded for by them. 

SUMMARY OF THE INVENTION 

It is an object of this invention to provide 
novel methods for constructing libraries that display, 
express or comprise a member of a diverse family of 
peptides, polypeptides or proteins and collectively 
display, express or comprise at least a portion of the 
diversity of the family. These methods are not biased 
toward DNAs that contain native sequences that are 
complementary to the primers used for amplification. 
They also enable any sequences that may be deleterious 
to expression to be removed from the amplified DNA 
before cloning and displaying and/or expressing. 

It is another object of this invention to 
provide a method for cleaving single-stranded nucleic 
acid sequences at a desired location, the method 
comprising the steps of: 

(i) contacting the nucleic acid with a 
single-stranded oligonucleotide, the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired and 
including a sequence that with its complement 
in the nucleic acid forms a restriction 
endonuclease recognition site that on 
restriction results in cleavage of the 
nucleic acid at the desired location; and 

(ii) cleaving the nucleic acid solely at 
the recognition site formed by the 
complementation of the nucleic acid and the 
oligonucleotide; 



the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 
It is a further object of this invention to 
provide an alternative method for cleaving single- 
stranded nucleic acid sequences at a desired location, 
the method comprising the steps of: 

(i) contacting the nucleic acid with a 
partially double-stranded oligonucleotide, 
the single-stranded region of the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
double-stranded region of the oligonucleotide 
having a restriction endonuclease recognition 
site; and 

(ii) cleaving the nucleic acid solely at 
the cleavage site formed by the 
complementation of the nucleic acid and the 
single-stranded region of the 
oligonucleotide; 

the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 



at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 

In an alternative embodiment of this object 
of the invention, the restriction endonuclease 
recognition site is not initially located in the 
double-stranded part of the oligonucleotide. Instead, 
it is part of an amplification primer, which primer is 
complementary to the double-stranded region of the 
oligonucleotide. On amplification of the DNA-partially 
double-stranded combination, the restriction 
endonuclease recognition site carried on the primer 
becomes part of the DNA. It can then be used to cleave 
the DNA. 

Preferably, the restriction endonuclease 
recognition site is that of a Type II-S restriction 
endonuclease whose cleavage site is located at a known 
distance from its recognition site. 

It is another object of the present invention 
to provide a method of capturing DNA molecules that 
comprise a member of a diverse family of DNAs and 
collectively comprise at least a portion of the 
diversity of the family. These DNA molecules in 
single-stranded form have been cleaved by one of the 
methods of this invention. This method involves 
ligating the individual single-stranded DNA members of 
the family to a partially duplex DNA complex. The 
method comprises the steps of: 

(i) contacting a single-stranded nucleic 
acid sequence that has been cleaved with a 
restriction endonuclease with a partially 
double-stranded oligonucleotide, the single- 
stranded region of the oligonucleotide being- 
functionally complementary to the nucleic 



acid in the region that remains after 
cleavage, the double-stranded region of the 
oligonucleotide including any sequences 
necessary to return the sequences that remain 
after cleavage into proper reading frame for 
expression and containing a restriction 
endonuclease recognition site 5 f of those 
sequences; and 

(ii) cleaving the partially double- 
stranded oligonucleotide sequence solely at 
the restriction endonuclease cleavage site 
contained within the double-stranded region 
of the partially double-stranded 
oligonucleotide . 



As before, in this object of the invention, 
the restriction endonuclease recognition site need not 
be located in the double-stranded portion of the 
oligonucleotide. Instead, it can be introduced on 
amplification with an amplification primer that is used 
to amplify the DNA-partially double-stranded 
oligonucleotide combination. 

It is another object of this invention to 
prepare libraries, that display, express or comprise a 
diverse family of peptides, polypeptides or proteins 
and collectively display, express or comprise at least 
part of the diversity of the family, using the methods 
and DNAs described above. 

It is an object of this invention to screen 
those libraries to identify useful peptides, 
polypeptides and proteins and to use those substances 
in human therapy. 

Additional objects of the invention are 
reflected in claims 1-116. Each of these claims is 
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specifically incorporated by reference in this 
specification. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic of various methods that 
may be employed to amplify VH genes without using 
primers specific for VH sequences. 

FIG. 2 is a schematic of various methods that 
may be employed to amplify VL genes without using 
primers specific for VL sequences. 

FIG. 3 is a schematic of RACE amplification 
of antibody heavy and light chains. 

FIG. 4 depicts gel analysis of amplification 
products obtained after the primary PCR reaction from 4 
different patient samples. 

FIG. 5 depicts gel analysis of cleaved kappa 
DNA from Example 2. 

FIG. 6 depicts gel analysis of extender- 
cleaved kappa DNA from Example 2. 

FIG. 7 depicts gel analysis of the PCR 
product from the extender-kappa amplification from 
Example 2 . 

FIG. 8 depicts gel analysis of purified PCR 
product from the extender-kappa amplification from 
Example 2 . 

FIG. 9 depicts gel analysis of cleaved and 
ligated kappa light chains from Example 2. 

FIG. 10 is a schematic of the design for CDR1 
and CDR2 synthetic diversity. 

FIG. 11 is a schemaitc of the cloning 
schedule for construction of the heavy chain 
repertoire . 

FIG. 12 is a schematic of the cleavage and 
ligation of the antibody light chain. 
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FIG. 13 depicts gel analysis of cleaved and 
ligated lambda light chains from Example 4. 

FIG. 14 is a schematic of the cleavage and 
ligation of the antibody heavy chain. 

FIG. 15 depicts gel analysis of cleaved and 
ligated lambda light chains from Example 5. 

FIG. 16 is a schematic of a phage display 

vector . 

FIG. 17 is a schematic of a Fab cassette. 

FIG. 18 is a schematic of a process for 
incorporating fixed FR1 residues in an antibody lambda 
sequence. 

FIG. 19 is a schematic of a process for 
incorporating fixed FR1 residues in an antibody kappa 
sequence. 

FIG. 20 is a schematic of a process for 
incorporating fixed FR1 residues in an antibody heavy 
chain sequence. 



TERMS 

20 In this application, the following terms and 

abbreviations are used: 



25 



Sense strand 



The upper strand of ds DNA as 
usually written. In the sense 
strand, S'-ATG^ 1 codes for 
Met. 



30 



Antisense strand 



The lower strand of ds DNA as 
usually written. In the 
antisense strand, 3 f -TAC-5' 
would correspond to a Met 
codon in the sense strand. 



Forward primer 



Backward primer 



Bases 



Sv 
Ap 
ap* 

RERS 



A "forward" primer is 
complementary to a part of the 
sense strand and primes for 
synthesis of a new antisense- 
strand molecule. "Forward 
primer" and "lower-strand 
primer" are equivalent. 

A "backward" primer is 
complementary to a part of the 
antisense strand and primes 
for synthesis of a new sense- 
strand molecule. "Backward 
primer" and "top-strand 
primer" are equivalent. 

Bases are specified either by 
their position in a vector or 
gene as their position within 
a gene by codon and base. For 
example, "89.1" is the first 
base of codon 89, 89.2 is the 
second base of codon 89. 

Streptavidin 

Ampicillin 

A gene conferring ampicillin 
resistance . 

Restriction endonuclease 
recognition site 
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URE 

5 Functionally 

complementary 

AA 
10 PCR 
GLGs 
Ab 

15 
20 

Fab 

25 



Restriction endonuclease - 
cleaves preferentially at RERS 

Universal restriction 
endonuclease 

Two sequences are sufficiently 
complementary so as to anneal 
under the chosen conditions. 

Amino acid 

Polymerization chain reaction 

Germline genes 

Antibody: an immunoglobin . 
The term also covers any 
protein having a binding 
domain which is homologous to 
an immunoglobin binding 
domain. A few examples of 
antibodies within this 
definition are, inter alia, 
immunoglobin isotypes and the 
Fab, F(ab 1 ) 2 / scfv, Fv, dAb and 
Fd fragments. 

Two chain molecule comprising 
an Ab light chain and part of 
a heavy-chain. 
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scFv A single-chain Ab comprising 

either VH: : linker : :VL or 
VL: : linker: :VH 

w.t. Wild type 

HC Heavy chain 

LC Light chain 

VK A variable domain of a Kappa 

light chain. 

VH A variable domain of a heavy 

chain. 

VL A variable domain of a lambda 

light chain. 

In this application when it is said that 
nucleic acids are cleaved solely at the cleavage site 
of a restriction endonuclease, it should be understood 
that minor cleavage may occur at random, e.g., at non- 
specific sites other than the specific cleavage site 
that is characteristic of the restriction endonuclease. 
The skilled worker will recognize that such non- 
specific, random cleavage is the usual occurrence. 
Accordingly, "solely at the cleavage site" of a 
restriction endonuclease means that cleavage occurs 
preferentially at the site characteristic of that 
endonuclease . 

As used in this application and claims, the 
term "cleavage site formed by the complementation of 
the nucleic acid and the single-stranded region of the 
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oligonucleotide" includes cleavage sites formed. by the 
single-stranded portion of the partially double- 
stranded ologonucleotide duplexing with the single- 
stranded DNA, cleavage sites in the double-stranded 
5 portion of the partially double-stranded 

oligonucleotide, and cleavage sites introduced by the 
amplification primer used to amplify the single- 
stranded DNA-partially double-stranded oligonucleotide 
combination. 

10 In the two methods of this invention for 

preparing single-stranded nucleic acid sequences, the 
first of those cleavage sites is preferred. In the 



1 

yh methods of this invention for capturing diversity and 

—35. 

9; cloning a family of diverse nucleic acid sequences, the 

\i 

* 15 latter two cleavage sites are preferred. 

a In this application, all references referred 

y. 

^ to are specifically incorporated by reference. 

w 
fli 

3» DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

y : _ _ 

D 

r; The nucleic acid sequences that are useful in 

20 the methods of this invention, i.e., those that encode 
at least in part the individual peptides, polypeptides 
and proteins displayed, or expressed in or comprising 
the libraries of this invention, may be native, 
synthetic or a combination thereof. They may be mRNA, 
25 DNA or cDNA. In the preferred embodiment, the nucleic 
acids encode antibodies. Most preferably, they encode 
Fabs. 

The nucleic acids useful in this invention 
may be naturally diverse, synthetic diversity may be 
30 introduced into those naturally diverse members, or the 
diversity may be entirely synthetic. For example, 
synthetic diversity can be introduced into one or more 
CDRs of antibody genes. Preferably, it is introduced 
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into CDR1 and CDR2 of immunoglobulins. Preferably, 
natural diversity is captured in the CDR3 regions of 
the immunoglogin genes of this invention from B cells. 
Most preferably, the nucleic acids of this invention 
5 comprise a population of immunoglobin genes that 

comprise synthetic diversity in at least one, and more 
preferably both of the CDR1 and CDR2 and diversity in 
CDR3 captured from B cells. 

Synthetic diversity may be created, for 
. . 10 example, through the use of TRIM technology (U.S. 

js=5: 

p 5,869,644). TRIM technology allows control over 

Q exactly which amino-acid types are allowed at 

ypi variegated positions and in what proportions. In TRIM 

fi technology, codons to be diversified are synthesized 

15 using mixtures of trinucleotides. This allows any set 
of amino acid types to be included in any proportion. 
H s Another alternative that may be used to 

generate diversified DNA is mixed oligonucleotide 
synthesis. With TRIM technology, one could allow Ala 
20 and Trp. With mixed oligonucleotide synthesis, a 
mixture that included Ala and Trp would also 
necessarily include Ser and Gly. The amino-acid types 
allowed at the variegated positions are picked with 
reference to the structure of antibodies, or other 
25 peptides, polypeptides or proteins of the family, the 
observed diversity in germline genes, the observed 
somatic mutations frequently observed, and the desired 
areas and types of variegation. 

In a preferred embodiment of this invention, 
30 the nucleic acid sequences for at least one CDR or 

other region of the peptides, polypeptides or proteins 
of the family are cDNAs produced by reverse 
transcription from mRNA. More preferably, the mRNAs . 
are obtained from peripheral blood cells, bone marrow 



0 

Hi 

if 
0 
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cells, spleen cells or lymph node cells (such as 
B- lymphocytes or plasma cells) that express members of 
naturally diverse sets of related genes. More 
preferable, the mRNAs encode a diverse family of 
5 antibodies. Most preferably, the mRNAs are obtained 
from patients suffering from at least one autoimmune 
disorder or cancer. Preferably, mRNAs containing a 
high diversity of autoimmune diseases, such as systemic 
lupus erythematosus, systemic sclerosis, rheumatoid 
M: 10 arthritis, antiphospholipid syndrome and vasculitis are 

used. 

In a preferred embodiment of this invention, 



f 

111 the cDNAs are produced from the mRNAs using reverse 

m 

y transcription. In this preferred embodiment, the mRNAs 

15 are separated from the cell and degraded using standard 

? methods, such that only the full length (i.e., capped) 

hi mRNAs remain. The cap is then removed and reverse 

HJ transcription used to produce the cDNAs. 

j pi 

The reverse transcription of the first 
y : 20 (antisense) strand can be done in any manner with any 

suitable primer. See, e.g., HJ'de Haard et al . , 
Journal of Biological Chemistry , 274 (26) : 18218-30 
(1999) . In the preferred embodiment of this invention 
where the mRNAs encode antibodies, primers that are 
25 complementary to the constant regions of antibody genes 
may be used. Those primers are useful because they do 
not generate bias toward subclasses of antibodies. In 
another embodiment, poly-dT primers may be used (and 
may be preferred for the heavy-chain genes) . 
30 Alternatively, sequences complementary to the primer 

may be attached to the termini of the antisense strand. 

In one preferred embodiment of this 
invention, the reverse transcriptase primer may be 
biotinylated, thus allowing the cDNA product to be 
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immobilized on streptavidin (Sv) beads. Immobilization 
can also be effected using a primer labeled at the 5 1 
end with one of a) free amine group, b) thiol, c) 
carboxylic acid, or d) another group not found in DNA 
5 that can react to form a strong bond to a known partner 
on an insoluble medium. If, for example, a free amine 
(preferably primary amine) is provided at the 5 f end of 
a DNA primer, this amine can be reacted with carboxylic 
acid groups on a polymer bead using standard amide- 
y : 10 forming chemistry. If such preferred immobilization is 

W used during reverse transcription, the top strand RNA 

w 

is degraded using well-known enzymes, such as a 
UH combination of RNAseH and RNAseA, either before or 

f; after immobilization. 

W 

j:: 15 The nucleic acid sequences useful in the 

5 methods of this invention are generally amplified 

Mi 

j4 before being used to display and/or express the 

f|j peptides, polypeptides or proteins that they encode. 

Prior to amplification, the single-stranded DNAs may be 
y. 20 cleaved using either of the methods described before. 

Alternatively, the single-stranded DNAs may be 
amplified and then cleaved using one of those methods. 

Any of the well known methods for amplifying 
nucleic acid sequences may be used for such 
25 amplification. Methods that maximize, and do not bias, 
diversity are preferred. In a preferred embodiment of 
this invention where the nucleic acid sequences are 
derived from antibody genes, the present invention 
preferably utilizes primers in the constant regions of 
30 the heavy and light chain genes and primers to a 

synthetic sequence that are attached at the 5 1 end of 
the sense strand. Priming at such synthetic sequence 
avoids the use of sequences within the variable regions 
of the antibody genes. Those variable region priming 



sites generate bias against V genes that are either of 
rare subclasses or that have been mutated at the 
priming sites. This bias is partly due to suppression 
of diversity within the primer region and partly due to 
lack of priming when many mutations are present in the 
region complementary to the primer. The methods 
disclosed in this invention have the advantage of not 
biasing the population of amplified antibody genes for 
particular V gene types. 

The synthetic sequences may be attached to 
the 5 ? end of the DNA strand by various methods well 
known for ligating DNA sequences together. RT 
CapExtention is one preferred method. 

In RT CapExtention (derived from Smart 
PCR (TM) ), a short overlap ( 5 1 - . . . GGG-3 1 in the upper- 
strand primer (USP-GGG) complements 3 , -CCC....5 l in the 
lower strand) and reverse transcriptases are used so 
that the reverse complement of the upper-strand primer 
is attached to the lower strand. 

FIGs. 1 and 2 show schematics to amplify VH 
and VL genes using RT CapExtention. FIG. 1 shows a 
schematic of the amplification of VH genes. FIG. 1, 
Panel A shows a primer specific to the poly-dT region 
of the 3 1 UTR priming synthesis of the first, lower 
strand. Primers that bind in the constant region are 
also suitable. Panel B shows the lower strand extended 
at its 3 f end by three Cs that are not complementary to 
the mRNA. Panel C shows the result of annealing a 
synthetic top-strand primer ending in three GGGs that 
hybridize to the 3 1 terminal CCCs and extending the 
reverse transcription extending the lower strand by the 
reverse complement of the synthetic primer sequence. 
Panel D shows the result of PCR amplification using a 
5 f biotinylated synthetic top-strand primer that 



replicates the 5' end of the synthetic primer of panel 
C and a bottom-strand primer complementary to part of 
the constant domain. Panel E shows immobilized double- 
stranded (ds) cDNA obtained by using a 5 f -biotinylated 
top-strand primer. 

FIG. 2 shows a similar schematic for 
amplification of VL genes. FIG. 2, Panel A shows a 
primer specific to the constant region at or near the 
3 f end priming synthesis of the first, lower strand. 
Primers that bind in the poly-dT region are also 
suitable. Panel B shows the lower strand extended at 
its 3 1 end by three Cs that are not complementary to 
the mRNA. Panel C shows the result of annealing a 
synthetic top-strand primer ending in three GGGs that 
hybridize to the 3 f terminal CCCs and extending the 
reverse transcription extending the lower strand by the 
reverse complement of the synthetic primer sequence. 
Panel D shows the result of PCR amplification using a 
5 1 biotinylated synthetic top-strand primer that 
replicates the 5 1 end of the synthetic primer of panel 
C and a bottom-strand primer complementary to part of 
the constant domain. The bottom-strand primer also 
contains a useful restriction endonuclease site, such 
as Ascl. Panel E shows immobilized ds cDNA obtained by 
using a 5 f -biotinylated top-strand primer. 

In FIGs. 1 and 2, each V gene consists of a 
5 1 untranslated region (UTR) and a secretion signal, 
followed by the variable region, followed by a constant 
region, followed by a 3' untranslated region (which 
typically ends in poly-A) . An initial primer for 
reverse transcription may be complementary to the 
constant region or to the poly A segment of the 3 f -UTR. 
For human heavy-chain genes, a primer of 15 T is 
preferred. Reverse transcriptases attach several C 
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residues to the 3' end of the newly synthesized DNA. 
RT CapExtention exploits this feature. The reverse 
transcription reaction is first run with only a lower- 
strand primer. After about 1 hour, a primer ending in 
GGG (USP-GGG) and more RTase are added. This causes 
the lower-strand cDNA to be extended by the reverse 
complement of the USP-GGG up to the final GGG. Using 
one primer identical to part of the attached synthetic 
sequence and a second primer complementary to a region 
of known sequence at the 3 1 end of the sense strand, 
all the V genes are amplified irrespective of their V 
gene subclass. 

In another preferred embodiment, synthetic 
sequences may be added by Rapid Amplification of cDNA 
Ends (RACE) (see Frohman, M.A., Dush, M.K., & Martin, 
G.R. (1988) Proc. Natl. Acad. Sci. USA (85): 
8998-9002) . 

FIG. 1 shows a schematic of RACE 
amplification of antibody heavy and light chains. 
First, mRNA is selected by treating total or poly(A+) 
RNA with calf intestinal phosphatase (CIP) to remove 
the 5' -phosphate from all molecules that have them such 
as ribosomal RNA, fragmented mRNA, tRNA and genomic 
DNA. Full length mRNA (containing a protective 7- 
methyl cap structure) is uneffected. The RNA is then 
treated with tobacco acid pyrophosphatase (TAP) to 
remove the cap structure from full length mRNAs leaving 
a 5 1 -monophosphate group. Next, a synthetic RNA 
adaptor is ligated to the RNA population, only 
molecules which have a 5-phosphate (uncapped, full 
length mRNAs) will accept the adaptor. Reverse 
trascriptase reactions using an oligodT primer, and 
nested PCR (using one adaptor primer (located in the 5 1 
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synthetic adaptor) and one primer for the gene) are 
then used to amplify the desired transcript. 

In a preferred embodiment of this invention, 
the upper strand or lower strand primer may be also 
5 biotinylated or labeled at the 5 f end with one of a) 
free amino group, b) thiol, c) carboxylic acid and d) 
another group not found in DNA that can react to form a 
strong bond to a known partner as an insoluble medium. 
These can then be used to immobilize the labeled strand 

10 after amplification. The immobilized DNA can be either 
single or double-stranded. 

After amplification (using e.g., RT 
CapExtension or RACE), the DNAs of this invention are 
rendered single-stranded. For example, the strands can 

15 be separated by using a biotinylated primer, capturing 
the biotinylated product on streptavidin beads, 
denaturing the DNA, and washing away the complementary 
strand. Depending on which end of the captured DNA is 
wanted, one will choose to immobilize either the upper 

20 (sense) strand or the lower (antisense) strand. 

To prepare the single-stranded amplified DNAs 
for cloning into genetic packages so as to effect 
display of, or for expression of, the peptides, 
polypeptides or proteins encoded, at least in part, by 

2 5 those DNAs, they must be manipulated to provide ends 

suitable for cloning and display and/or expression. In 
particular, any 5 f untranslated regions and mammalian 
signal sequences must be removed and replaced, in 
frame, by a suitable signal sequence that functions in 

30 the display or expression host. Additionally, parts of 
the variable domains (in antibody genes) may be removed 
and replaced by synthetic segments containing synthetic 
diversity. The diversity of other gene families may 
likewise be expanded with synthetic diversity. 
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According to the methods of this invention, 
there are two ways to manipulate the single-stranded 
DNAs for display and/or expression. The first method 
comprises the steps of: 
5 (i) contacting the nucleic acid with a 

single-stranded oligonucleotide, the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired and 
y : 10 including a sequence that with its complement 

© in the nucleic acid forms a restriction 

endonuclease recognition site that on 
ypi restriction results in cleavage of the 

0' j nucleic acid at the desired location; and 

SI 

_» 15 (ii) cleaving the nucleic acid solely at 

s the recognition site formed by the 

complementation of the nucleic acid and the 
m oligonucleotide; 

m 

the contacting and the cleaving steps being performed 
20 at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
25 at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 

In this first method, short oligonucleotides 
are annealed to the single-stranded DNA so that 
30 restriction endonuclease recognition sites formed 

within the now locally double-stranded regions of the 
DNA can be cleaved. In particular, a recognition site 
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that occurs at the same position in a substantial 
fraction of the single-stranded DNAs is identical. 

For antibody genes, this can be done using a 
catalog of germline sequences. See, e.g., 
"http: //www.mrc-cpe . cam. ac .uk/imt-doc/ res trie ted/ok. htm 
1." Updates can be obtained from this site under the 
heading "Amino acid and nucleotide sequence 
alignments." For other families, similar comparisons 
exist and may be used to select appropriate regions for 
cleavage and to maintain diversity. 

For example, Table 1 depicts the DNA 
sequences of the FR3 regions of the 51 known human VH 
germline genes. In this region, the genes contain 
restriction endonuclease recognition sites shown in 
Table 2. Restriction endonucleases that cleave a large 
fraction of germline genes at the same site are 
preferred over endonucleases that cut at a variety of 
sites. Furthermore, it is preferred that there be only 
one site for the restriction endonucleases within the 
region to which the short oligonucleotide binds on the 
single-stranded DNA, e.g., about 10 bases on either 
side of the restriction endonuclease recognition site. 

An enzyme that cleaves downstream in FR3 is 
also more preferable because it captures fewer 
mutations in the framework. This may be advantageous 
is some cases. However, it is well known that 
framework mutations exist and confer and enhance 
antibody binding. The present invention, by choice of 
appropriate restriction site, allows all or part of FR3 
diversity to be captured. Hence, the method also 
allows extensive diversity to be captured. 

Finally, in the methods of this invention 
restriction endonucleases that are active between about 
37°C and about 75°C are used. Preferably, restriction 
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endonucleases that are active between about 45°C and 
about 75°C may be used. More preferably, enzymes that 
are active above 50°C, and most preferably active about 
55°C, are used. Such temperatures maintain the nucleic 
5 acid sequence to be cleaved in substantially single- 
stranded form. 

Enzymes shown in Table 2 that cut many of the 
heavy chain FR3 germline genes at a single position 
include: MaelH (24@4) , Tsp45I (2104) , Jfp/jl (44@5) , 
N- 10 BsaJI (23065) , Alul (23047), BlpI (21048) , Ddel (29058), 

g Bglll (10@61) , MslI (44072), BsiZI (23074) , Eael (23074), 

4;: EagI (23074), tfaelll (25075) , Bst4CI (51086) , 

^ HpyCH4III (51086) , flinfl(38@2), MZyI(18@2), PIeI{18@2), 

SI MnJI (31067), tfpyCH4V(21044) , BswAI ( 1 6011 ) , Bp/nl (19012) , 

+ 15 Xmnl (12030) , and SacI (11051). (The notation used 

a 

U means, for example, that BswAI cuts 16 of the FR3 

W germline genes with a restriction endonuclease 

Hi 

recognition site beginning at base 11 of FR3.) 
G For cleavage of human heavy chains in FR3, 

20 the preferred restriction endonucleases are: Bst4CI (or 
Taal or HpyCH4III), BlpI, HpyCH4V, and MslI . Because 
ACNGT (the restriction endonuclease recognition site 
for Bst4CI, Taal, and HpyCH4III) is found at a 
consistent site in all the human FR3 germline genes, 

25 one of those enzymes is the most preferred for capture 
of heavy chain CDR3 diversity. BlpI and HpyCH4V are 
complementary. BlpI cuts most members of the VH1 and 
VH4 families while tfpyCH4V cuts most members of the 
VH3, VH5, VH6, and VH7 families. Neither enzyme cuts 

30 VH2s, but this is a very small family, containing only 
three members. Thus, these enzymes may also be used in 
preferred embodiments of the methods of this invention. 



The restriction endonucleases JfpyCH4III, 
BstACI, and Taal all recognize 5 f -ACnGT-3 f and cut 
upper strand DNA after n and lower strand DNA before 
the base complementary to n. This is the most 
preferred restriction endonuclease recognition site for 
this method on human heavy chains because it is found 
in all germline genes. Furthermore, the restriction 
endonuclease recognition region (ACnGT) matches the 
second and third bases of a tyrosine codon (tay) and 
the following cysteine codon (tqy) as shown in Table 3. 
These codons are highly conserved, especially the 
cysteine in mature antibody genes. 

Table 4 E shows the distinct oligonucleotides 
of length 22 (except the last one which is of length 
20) bases. Table 5 C shows the analysis of 1617 actual 
heavy chain antibody genes. Of these, 1511 have the 
site and match one of the candidate oligonucleotides to 
within 4 mismatches. Eight oligonucleotides account 
for most of the matches and are given in Table 4 F.l. 
The 8 oligonucleotides are very similar so that it is 
likely that satisfactory cleavage will be achieved with 
only one oligonucleotide (such as H43 . 77 . 97 . 1-02#1 ) by 
adjusting temperature, pH, salinity, and the like. One 
or two oligonucleotides may likewise suffice whenever 
the germline gene sequences differ very little and 
especially if they differ very little close to the 
restriction endonuclease recognition region to be 
cleaved. Table 5 D shows a repeat analysis of 1617 
actual heavy chain antibody genes using only the 8 
chosen oligonucleotides. This shows that 1463 of the 
sequences match at least one of the oligonucleotides to 
within 4 mismatches and have the site as expected. 
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Only 7 sequences have a second HpyCH4III restriction 
endonuclease recognition region in this region. 

Another illustration of choosing an 
appropriate restriction endonuclease recognition site 
5 involves cleavage in FR1 of human heavy chains. 
Cleavage in FR1 allows capture of the entire CDR 
diversity of the heavy chain. 

The germline genes for human heavy chain FR1 
are shown in Table 6. Table 7 shows the restriction 

y. 

p ; 10 endonuclease recognition sites found in human germline 

p genes FRls. The preferred sites are Bsgl (GTGCAG; 3904 ) , 

% BsoFI (GCngc; 43@6, 1109, 203, 1@12) , 

y ! 

m Tsel (Gcwgc;4306, 1109,203, 1012) , 

MspAlI (CMGckg; 4 607, 201) , PvuII (CAGctg; 4 607 , 201 ) , 
= ' 15 Alul (AGct; 4808202 ) , Ddel (Ctnag; 22052, 9048 ) , 

J* HphI (tcacc;2208O) , BssKI (Nccngg; 35039, 2040) , 

Sj BsaJI (Ccnngg; 32040, 2041) , BstNI (CCwgg; 33040) , 

^ ScrFI (CCngg; 3504 0, 2041) , EcoO109I (RGgnccy; 2204 6, 

0 

y : 11043), 5au96I (Ggncc; 23047, 11044) , 

20 Avail (Ggwcc;23047, 4044) , PpuMI (RGgwccy; 2204 6, 4043) , 
BsmFI (gtccc;2O048) , Hinfl (Gantc; 34016, 21056, 21077) , 
Tjfil (21077) , Ml yl (GAGTC; 34016) , MIyl (gactc; 21056) , and 
AIwNI (CAGnnnctg; 22068 ) . The more preferred sites are 
MspAI and PvuII. MspAI and PvuII have 46 sites at 7-12 
25 and 2 at 1-6. To avoid cleavage at both sites, 

oligonucleotides are used that do not fully cover the 
site at 1-6. Thus, the DNA will not be cleaved at that 
site. We have shown that DNA that extends 3, 4, or 5 
bases beyond a PvuII-site can be cleaved efficiently. 
30 Another illustration of choosing an 

appropriate restriction endonuclease recognition site 
involves cleavage in FR1 of human kappa light chains. 
Table 8 shows the human kappa FR1 germline genes and 
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Table 9 shows restriction endonuclease recognition 
sites that are found in a substantial number of human 
kappa FR1 germline genes at consistent locations. Of 
the restriction endonuclease recognition sites listed, 
5 BsmAI and PflFI are the most preferred enzymes. BsmAI 
sites are found at base 18 in 35 of 40 germline genes. 
PflFI sites are found in 35 of 40 germline genes at 
base 12. 

Another example of choosing an appropriate 

1st: 

j«% 10 restriction endonuclease recognition site involves 

P cleavage in FR1 of the human lambda light chain. Table 

pi 

^ 10 shows the 31 known human lambda FR1 germline gene 

Ui 

g J i sequences. Table 11 shows restriction endonuclease 

^ recognition sites found in human lambda FR1 germline 

15 genes. Hint I and Ddel are the most preferred 

H 5 restriction endonucleases for cutting human lambda 

5\ chains in FR1 . 

|n After the appropriate site or sites for 

p cleavage are chosen, one or more short oligonucleotides 



20 are prepared so as to functionally complement, alone or 
in combination, the chosen recognition site. The 
oligonucleotides also include sequences that flank the 
recognition site in the majority of the amplified 
genes. This flanking region allows the sequence to 

25 anneal to the single-stranded DNA sufficiently to allow 
cleavage by the restriction endonuclease specific for 
the site chosen. 

The actual length and sequence of the 
oligonucleotide depends on the recognition site and the 

30 conditions to be used for contacting and cleavage. The 
length must be sufficient so that the oligonucleotide 
is functionally complementary to the single-stranded 
DNA over a large enough region to allow the two strands 
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to associate such that cleavage may occur at the chosen 
temperature and at the desired location. 

Typically, the oligonucleotides of this 
preferred method of the invention are about 17 to about 
5 30 nucleotides in length. Below about 17 bases, 

annealing is too weak and above 30 bases there can be a 
loss of specificity. A preferred length is 18 to 24 
bases . 

Oligonucleotides of this length need not be 
y» 10 identical complements of the germline genes. Rather, a 

2 few mismatches taken may be tolerated. Preferably, 

j| however, no more than 1-3 mismatches are allowed. Such 

01 mismatches do not adversely affect annealing of the 

Jfj oligonucleotide to the single-stranded DNA. Hence, the 

sp; 15 two DNAs are said to be functionally complementary. 

s The second method to manipulate the single- 

ts 

r*, stranded DNAs of this invention for display and/or 

expression comprises the steps of: 

3 5 s : 

%l (i) contacting the nucleic acid with a 

y : 20 partially double-stranded oligonucleotide, 

the single-stranded region of the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
25 double-stranded region of the oligonucleotide 

having a restriction endonuclease recognition 
site; and 

(ii) cleaving the nucleic acid solely at 
the cleavage site formed by the 
30 complementation of the nucleic acid and the 

single-stranded region of the 
oligonucleotide; 



the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 

As explained above, the cleavage site may be 
formed by the single-stranded portion of the partially 
double-stranded oligonucleotide duplexing with the 
single-stranded DNA, the cleavage site may be carried 
in the double-stranded portion of the partially double- 
stranded oligonucleotide, or the cleavage site may be 
introduced by the amplification primer used to amplify 
the single-stranded DNA-partially double-stranded 
oligonucleotide combination. In this embodiment, the 
first is preferred. And, the restriction endonuclease 
recognition site may be located in either the double- 
stranded portion of the oligonucleotide or introduced 
by the amplification primer, which is complementary to 
that double-stranded region, as used to amplify the 
combination. 

Preferably, the restriction endonuclease site 
is that of a Type II-S restriction endonuclease, whose 
cleavage site is located at a known distance from its 
recognition site. 

This second method, preferably, employs 
Universal Restriction Endonucleases ("URE"). UREs are 
partially double-stranded oligonucleotides. The 
single-stranded portion or overlap of the URE consists 
of a DNA adapter that is functionally complementary to 
the sequence to be cleaved in the single-stranded DNA. 
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The double-stranded portion consists of a restriction 
endonuclease recognition site, preferably type II-S. 

The URE method of this invention is specific 
and precise and can tolerate some (e.g., 1-3) 
5 mismatches in the complementary regions, i.e., it is 
functionally complementary to that region. Further, 
conditions under which the URE is used can be adjusted 
so that most of the genes that are amplified can be 
cut, reducing bias in the library produced from those 
10 genes. 

p The sequence of the single-stranded DNA 

M adapter or overlap portion of the URE typically 

pi consists of about 14-22 bases. However, longer or 

Sj shorter adapters may be used. The size depends on the 

+ 15 ability of the adapter to associate with its functional 

complement in the single- stranded DNA and the 
O temperature used for contacting the URE and the single- 

;£ stranded DNA at the temperature used for cleaving the 

F; DNA with the restriction enzyme. The adapter must be 

20 functionally complementary to the single-stranded DNA 
over a large enough region to allow the two strands to 
associate such that the cleavage may occur at the 
chosen temperature and at the desired location. We 
prefer singe-stranded or overlap portions of 14-17 
25 bases in length, and more preferably 18-20 bases in 
length. 

The site chosen for cleavage using the URE is 
preferably one that is substantially conserved in the 
family of amplified DNAs. As compared to the first 
30 cleavage method of this invention, these sites do not 
need to be endonuclease recognition sites. However, 
like the first method, the sites chosen can be 
synthetic rather than existing in the native DNA. Such 
sites may be chosen by references to the sequences of 
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known antibodies or other families of genes. For 
example, the sequences of many germline genes are 
reported at http: //www.mrc-cpe . cam.ac . uk/imt- 
doc/restricted/ok.html . For example, one preferred 
5 site occurs near the end of FR3 — codon 89 through the 
second base of codon 93. CDR3 begins at codon 95. 

The sequences of 79 human heavy-chain genes 
are also available at 

http : //www. ncbi .nlm.nih . gov/entre2 /nucleotide .html . 
M 10 This site can be used to identify appropriate sequences 

» for URE cleavage according to the methods of this 

w 

=p; invention. See, e.g., Table 12B. 

Most preferably, one or more sequences are 



identified using these sites or other available 



4» 15 sequence information. These sequences together are 



present in a substantial fraction of the amplified 



p DNAs . For example, multiple sequences could be used to 



allow for known diversity in germline genes or for 



P: frequent somatic mutations. Synthetic degenerate 

N : 20 sequences could also be used. Preferably, a 

sequence (s) that occurs in at least 65% of genes 
examined with no more than 2-3 mismatches is chosen 

URE single-stranded adapters or overlaps are 
then made to be complementary to the chosen regions. 
25 Conditions for using the UREs are determined 

empirically. These conditions should allow cleavage of 
DNA that contains the functionally complementary 
sequences with no more than 2 or' 3 mismatches but that 
do not allow cleavage of DNA lacking such sequences. 
30 As described above, the double-stranded 

portion of the URE includes an endonuclease recognition 
site, preferably a Type II-S recognition site. Any 
enzyme that is active at a temperature necessary to 
maintain the single-stranded DNA substantially in that 



form and to allow the single-stranded DNA adapter 
portion of the URE to anneal long enough to the single- 
stranded DNA to permit cleavage at the desired site may 
be used. 

The preferred Type II-S enzymes for use in 
the URE methods of this invention provide asymmetrical 
cleavage of the single-stranded DNA. Among these are 
the enzymes listed in Table 13. The most preferred 
Type II-S enzyme is Fokl. 

When the preferred Fokl containing URE is 
used, several conditions are preferably used to effect 
cleavage : 

1) Excess of the URE over target DNA should be 
present to activate the enzyme. URE present 
only in equimolar amounts to the target DNA 
would yield poor cleavage of ssDNA because 
the amount of active enzyme available would 
be limiting. 

2) An activator may be used to activate part of 
the Fokl enzyme to dimerize without causing 
cleavage. Examples of appropriate activators 
are shown in Table 14. 

3) The cleavage reaction is performed at a 
temperature between 45°-75°C, preferably 
above 50°C and most preferably above 55°C. 

The UREs used in the prior art contained a 
14-base single-stranded segment, a 10-base stem 
(containing a Fokl site) , followed by the palindrome of 
the 10-base stem. While such UREs may be used in the 
methods of this invention, the preferred UREs of this 
invention also include a segment of three to eight 
bases (a loop) between the Fokl restriction 
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endonuclease recognition site containing segments. In 
the preferred embodiment, the stem (containing the Fokl 
site) and its palindrome are also longer than 10 bases. 
Preferably, they are 10-14 bases in length. Examples 
5 of these "lollipop" URE adapters are shown in Table 15. 

One example of using a URE to cleave an 
single-stranded DNA involves the FR3 region of human 
heavy chain. Table 16 shows an analysis of 840 full- 
length mature human heavy chains with the URE 
10 recognition sequences shown. The vast majority 
(718/840=0.85) will be recognized with 2 or fewer 
mismatches using five UREs (VHS881-1.1, VHS881-1.2, 
VHS881-2.1, VHS881-4.1, and VHS881-9.1). Each has a 
y; 20-base adaptor sequence to complement the germline 

15 gene, a ten-base stem segment containing a Fokl site, a 
five base loop, and the reverse complement of the first 
Q stem segment. Annealing those adapters, alone or in 

combination, to single-stranded antisense heavy chain 
DNA and treating with Fokl in the presence of, e.g., 
H 20 the activator FOKIact, will lead to cleavage of the 

antisense strand at the position indicated. 

Another example of using a URE(s) to cleave a 
single-stranded DNA involves the FR1 region of the 
human Kappa light chains. Table 17 shows an analysis 
25 of 182 full-length human kappa chains for matching by 
the four 19-base probe sequences shown. Ninety-six 
percent of the sequences match one of the probes with 2 
or fewer mismatches. The URE adapters shown in Table 
17 are for cleavage of the sense strand of kappa 
30 chains. Thus, the adaptor sequences are the reverse 
complement of the germline gene sequences. The URE 
consists of a ten-base stem, a five base loop, the 
reverse complement of the stem and the complementation 



m 
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sequence. The loop shown here is TTGTT, but other 
sequences could be used. Its function is to interrupt 
the palindrome of the stems so that formation of a 
lollypop monomer is favored over dimerization . Table 
17 also shows where the sense strand is cleaved. 

Another example of using a URE to cleave a 
single-stranded DNA involves the human lambda light 
chain. Table 18 shows analysis of 128 human lambda 
light chains for matching the four 19-base probes 
shown. With three or fewer mismatches, 88 of 128 (69%) 
of the chains match one of the probes. Table 18 also 
shows URE adapters corresponding to these probes . 
Annealing these adapters to upper-strand ssDNA of 
lambda chains and treatment with Fokl in the presence 
of FOKIact at a temperature at or above 45°C will lead 
to specific and precise cleavage of the chains. 

The conditions under which the short 
oligonucleotide sequences of the first method and the 
UREs of the second method are contacted with the 
single-stranded DNAs may be empirically determined. 
The conditions must be such that the single-stranded 
DNA remains in substantially single-stranded form. 
More particularly, the conditions must be such that the 
single-stranded DNA does not form loops that may 
interfere with its association with the oligonucleotide 
sequence or the URE or that may themselves provide 
sites for cleavage by the chosen restriction 
endonuclease . 

The effectiveness and specificity of short 
oligonucleotides (first method) and UREs (second 
method) can be adjusted by controlling the 
concentrations of the URE adapters/oligonucleotides and 
substrate DNA, the temperature, the pH, the 
concentration of metal ions, the ionic strength, the 
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concentration of chaotropes (such as urea and 
formamide) , the concentration of the restriction 
endonuclease (e. g. , Fokl) , and the time of the 
digestion. These conditions can be optimized with 
synthetic oligonucleotides having: 1) target germline 
gene sequences, 2) mutated target gene sequences, or 3) 
somewhat related non-target sequences. The goal is to 
cleave most of the target sequences and minimal amounts 
of non-targets. 

In accordance with this invention, the 
single-stranded DNA is maintained in substantially that 
form using a temperature between about 37 °C and about 
75°C. Preferably, a temperature between about 45°C and 
about 75°C is used. More preferably, a temperature 
between 50°C and 60°C, most preferably between 55°C and 
60°C, is used. These temperatures are employed both 
when contacting the DNA with the oligonucleotide or URE 
and when cleaving the DNA using the methods of this 
invention. 

The two cleavage methods of this invention 
have several advantages. The first method allows the 
individual members of the family of single-stranded 
DNAs to be cleaved preferentially at one substantially 
conserved endonuclease recognition site. The method 
also does not require an endonuclease recognition site 
to be built into the reverse transcription or 
amplification primers. Any native or synthetic site in 
the family can be used. 

The second method has both of these 
advantages. In addition, the preferred URE method 
allows the single-stranded DNAs to be cleaved at 
positions where no endonuclease recognition site 
naturally occurs or has been synthetically constructed. 



Most importantly, both cleavage methods 
permit the use of 5' and 3 1 primers so as to maximize 
diversity and then cleavage to remove unwanted or 
deleterious sequences before cloning, display and/or 
expression. 

After cleavage of the amplified DNAs using 
one of the methods of this invention, the DNA is 
prepared for cloning, display and/or expression. This 
is done by using a partially duplexed synthetic DNA 
adapter, whose terminal sequence is based on the 
specific cleavage site at which the amplified DNA has 
been cleaved. 

The synthetic DNA is designed such that when 
it is ligated to the cleaved single-stranded DNA in 
proper reading frame so that the desired peptide, 
polypeptide or protein can be displayed on the surface 
of the genetic package and/or expressed. Preferably, 
the double-stranded portion of the adapter comprises 
the sequence of several codons that encode the amino 
acid sequence characteristic of the family of peptides, 
polypeptides or proteins up to the cleavage site. For 
human heavy chains, the amino acids of the 3-23 
framework are preferably used to provide the sequences 
required for expression of the cleaved DNA. 

Preferably, the double-stranded portion of 
the adapter is about 12 to 100 bases in length. More 
preferably, about 20 to 100 bases are used. The 
double-standard region of the adapter also preferably 
contains at least one endonuclease recognition site 
useful for cloning the DNA into a suitable display 
and/or expression vector (or a recipient vector used to 
archive the diversity) . This endonuclease restriction 
site may be native to the germline gene sequences used 
to extend the DNA sequence. It may be also constructed 



using degenerate sequences to the native germline gene 
sequences. Or, it may be wholly synthetic. 

The single-stranded portion of the adapter is 
complementary to the region of the cleavage in the 
single-stranded DNA. The overlap can be from about 2 
bases up to about 15 bases. The longer the overlap, 
the more efficient the ligation is likely to be. A 
preferred length for the overlap is 7 to 10. This 
allows some mismatches in the region so that diversity 
in this region may be captured. 

The single-stranded region or overlap of the 
partially duplexed adapter is advantageous because it 
allows DNA cleaved at the chosen site, but not other 
fragments to be captured. Such fragments would 
contaminate the library with genes encoding sequences 
that will not fold into proper antibodies and are 
likely to be non-specif ically sticky. 

One illustration of the use of a partially 
duplexed adaptor in the methods of this invention 
involves ligating such adaptor to a human FR3 region 
that has been cleaved, as described above, at 5 ? -ACnGT- 
3 f using HpyCH4III, Bst4CI or Taal . 

Table 4 F.2 shows the bottom strand of the 
double-stranded portion of the adaptor for ligation to 
the cleaved bottom-strand DNA. Since the HpyCH4III- 
Site is so far to the right (as shown in Table 3), a 
sequence that includes the Afill-site as well as the 
Xbal site can be added. This bottom strand portion of 
the partially-duplexed adaptor, H43.XAExt, 
incorporates both Xbal and Aflll-sites. The top strand 
of the double-stranded portion of the adaptor has 
neither site (due to planned mismatches in the segments 
opposite the Xbal and Aflll-Sites of H43.XAExt), but 
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will anneal very tightly to H43.XAExt. H4 3AExt 
contains only the Aflll-site and is to be used with the 
top strands H43.ABrl and H43.ABr2 (which have 
intentional alterations to destroy the Afill-site) . 

After ligation, the desired, captured DNA can 
be PCR amplified again, if desired, using in the 
preferred embodiment a primer to the downstream 
constant region of the antibody gene and a primer to 
part of the double-standard region of the adapter. The 
primers may also carry restriction endonuclease sites 
for use in cloning the amplified DNA. 

After ligation, and perhaps amplification, of 
the partially double-stranded adapter to the single- 
stranded amplified DNA, the composite DNA is cleaved at 
chosen 5 1 and 3 1 endonuclease recognition sites. 

The cleavage sites useful for cloning depend 
on the phage or phagemid or other vectors into which 
the cassette will be inserted and the available sites 
in the antibody genes. Table 19 provides restriction 
endonuclease data for 75 human light chains. Table 20 
shows corresponding data for 79 human heavy chains. In 
each Table, the endonucleases are ordered by increasing 
frequency of cutting. In these Tables, Nch is the 
number of chains cut by the enzyme and Ns is the number 
of sites (some chains have more than one site) . 

From this analysis, Sfil, NotI, AflU, ApaLI, 
and AscI are very suitable. Sfil and NotI are 
preferably used in pCESl to insert the heavy-chain 
display segment. ApaLI and AscI are preferably used in 
pCESl to insert the light-chain display segment. 

BstEII-sites occur in 97% of germ-line JH 
genes. In rearranged V genes, only 54/79 (68%) of 
heavy-chain genes contain a BstEII-Site and 7/61 of 



these contain two sites. Thus, 47/79 (59%) contain a 
single BstEII-Site. An alternative to using BstEII is 
to cleave via UREs at the end of JH and ligate to a 
synthetic oligonucleotide that encodes part of CHI. 

One example of preparing a family of DNA 
sequences using the methods of this invention involves 
capturing human CDR 3 diversity. As described above, 
mRNAs from various autoimmune patients are reverse 
transcribed into lower strand cDNA. After the top 
strand RNA is degraded, the lower strand is immobilized 
and a short oligonucleotide used to cleave the cDNA 
upstream of CDR3 . A partially duplexed synthetic DNA 
adapter is then annealed to the DNA and the DNA is 
amplified using a primer to the adapter and a primer to 
the constant region (after FR4) . The DNA is then 
cleaved using BstEII (in FR4) and a restriction 
endonuclease appropriate to the partially double- 
stranded adapter (e.g., Xbal and Aflll (in FR3) ) . The 
DNA is then ligated into a synthetic VH skeleton such 
as 3-23. 

One example of preparing a single-stranded 
DNA that was cleaved using the URE method involves the 
human Kappa chain. The cleavage site in the sense 
strand of this chain is depicted in Table 17. The 
oligonucleotide kapextURE is annealed to the 
oligonucleotides (kaBROlUR, kaBR02UR, kaBR03UR, and 
kaBR04UR) to form a partially duplex DNA. This DNA is 
then ligated to the cleaved soluble kappa chains. The 
ligation product is then amplified using primers 
kapextUREPCR and CKForeAsc (which inserts a AscI site 
after the end of C kappa) . This product is then 
cleaved with ApaLI and AscI and ligated to similarly 
cut recipient vector. 
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Another example involves the cleavage of 
lambda light chains, illustrated in Table 18. After 
cleavage, an extender (ON_LamExi33) and four bridge 

Oligonucleotides (ON_LamBl-133, ON_LamB2-133, ON_LamB3-133, 

5 and ON_LamB4-l33) are annealed to form a partially duplex 
DNA. That DNA is ligated to the cleaved lambda-chain 
sense strands. After ligation, the DNA is amplified 
with ON_Laml33PCR and a forward primer specific to the 
lambda constant domain, such as CL2ForeAsc or 
M> 10 CL7ForeAsc (Table 130) . 

In human heavy chains, one can cleave almost 
all genes in FR4 (downstream, i.e., toward the 3 1 end 

- ; ' of the sense strand, of CDR3) at a BstEII-Site that 

0 s : 

occurs at a constant position in a very large fraction 
4* 15 of human heavy-chain V genes. One then needs a site in 

FR3, if only CDR3 diversity is to be captured, in FR2, 
if CDR2 and CDR3 diversity is wanted, or in FR1, if all 
fV the CDR diversity is wanted. These sites are 

preferably inserted as part of the partially double- 
20 stranded adaptor. 

The preferred process of this invention is to 
provide recipient vectors (e.g., for display and/or 
expression) having sites that allow cloning of either 
light or heavy chains. Such vectors are well known and 
25 widely used in the art. A preferred phage display 
vector in accordance with this invention is phage 
MALI A3 . This displays in gene III. The sequence of 
the phage MALI A3 is shown in Table 21A (annotated) and 
Table 21B (condensed) . 
30 The DNA encoding the selected regions of the 

light or heavy chains can be transferred to the vectors 
using endonucleases that cut either light or heavy 
chains only very rarely. For example, light chains may 
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be captured with ApaLI and AscI . Heavy-chain genes are 
preferably cloned into a recipient vector having Sfil, 
Ncol, Xbal, Aflll, BstEII, Apal, and NotI sites. The 
light chains are preferably moved into the library as 
5 ApaLI-AscI fragments. The heavy chains are preferably 
moved into the library as Sfil-NotI fragments. 

Most preferably, the display is had on the 
surface of a derivative of M13 phage. The most 
preferred vector contains all the genes of M13, an 

p 10 antibiotic resistance gene, and the display cassette. 

S The preferred vector is provided with restriction sites 



m 
m 



that allow introduction and excision of members of the 
diverse family of genes, as cassettes. The preferred 
vector is stable against rearrangement under the growth 
15 conditions used to amplify phage. 
N ! In another embodiment of this invention, the 

!!p diversity captured by the methods of the present 

y» invention may be displayed and/or expressed in a 

y phagemid vector (e.g., pCESl) that displays and/or 

20 expresses the peptide, polypeptide or protein. Such 
vectors may also be used to store the diversity for 
subsequent display and/or expression using other 
vectors or phage. 

In another embodiment of this invention, the 
25 diversity captured by the methods of the present 

invention may be displayed and/or expressed in a yeast 
vector. 

In another embodiment, the mode of display 
may be through a short linker to anchor domains — one 
30 possible anchor comprising the final portion of M13 III 
(" Illstump 11 ) and a second possible anchor being the 
full length III mature protein. 

The Illstump fragment contains enough of M13 
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III to assemble into phage but not the domains involved 
in mediating infectivity. Because the w.t. Ill 
proteins are present the phage is unlikely to delete 
the antibody genes and phage that do delete these 
5 segments receive only a very small growth advantage. 
For each of the anchor domains, the DNA encodes the 
w.t. AA sequence, but differs from the w.t. DNA 
sequence to a very high extent. This will greatly 
reduce the potential for homologous recombination 

N : 10 between the anchor and the w.t. gene that is also 

jg! present (see Example 6) . 

4£ Most preferably, the present invention uses a 

*J complete phage carrying an antibiotic-resistance gene 

y'i 

(such as an ampicillin-resistance gene) and the display 
4* 15 cassette. Because the w.t. iii and possibly viii genes 

are present, the w.t. proteins are also present. The 
display cassette is transcribed from a regulatable 
promoter (e.g., P La cz) • Use of a regulatable promoter 
allows control of the ratio of the fusion display gene 
N ! 20 to the corresponding w.t. coat protein. This ratio 

determines the average number of copies of the display 
fusion per phage (or phagemid) particle. 

Another aspect of the invention is a method 
of displaying peptides, polypeptides or proteins (and 
25 particularly Fabs) on filamentous phage. In the most 
preferred embodiment this method displays FABs and 
comprises : 

a) obtaining a cassette capturing a diversity of 

segments of DNA encoding the elements: 

30 P reg : :RBS1: :SS1: :VL: :CL: :stop: :RBS2: :SS2: :VH: :CH1: : 
linker : : anchor : : stop : : , 

where P reg is a regulatable promoter, RBS1 is a first 
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ribosome binding site, SSI is a signal sequence 
operable in the host strain, VL is a member of a 
diverse set of light-chain variable regions, CL is a 
light-chain constant region, stop is one or more stop 
5 codons, RBS2 is a second ribosome binding site, SS2 is 
a second signal sequence operable in the host strain, 
VH is a member of a diverse set of heavy-chain variable 
regions, CHI is an antibody heavy-chain first constant 
domain, linker is a sequence of amino acids of one to 
10 about 50 residues, anchor is a protein that will 

assemble into the filamentous phage particle and stop 
is a second example of one or more stop codons; and 
b) positioning that cassette within the phage 

genome to maximize the viability of the phage 
15 and to minimize the potential for deletion of 

the cassette or parts thereof. 

The DNA encoding the anchor protein in the 
above preferred cassette should be designed to encode 

20 the same (or a closely related) amino acid sequence as 
is found in one of the coat proteins of the phage, but 
with a distinct DNA sequence. This is to prevent 
unwanted homologous recombination with the w.t. gene. 
In addition, the cassette should be placed in the 

25 intergenic region. The positioning and orientation of 
the display cassette can influence the behavior of the 
phage . 

In one embodiment of the invention, a 
transcription terminator may be placed after the second 
30 stop of the display cassette above (e.g., Trp) . This 
will reduce interaction between the display cassette 
and other genes in the phage antibody display vector. 

In another embodiment of the methods of this 
invention, the phage or phagemid can display and/or 
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express proteins other than Fab, by replacing the Fab 
portions indicated above, with other protein genes. 

Various hosts can be used the display and/or 
expression aspect of this invention. Such hosts are 
5 well known in the art. In the preferred embodiment, 
where Fabs are being displayed and/or expressed, the 
preferred host should grow at 30°C and be RecA" (to 
reduce unwanted genetic recombination) and EndA" (to 
make recovery of RF DNA easier) . It is also preferred 
10 that the host strain be easily transformed by 
© electroporation. 

"?t XLl-Blue MRF' satisfies most of these 

m preferences, but does not grow well at 30 C. XLl-Blue 

Si MRF ? does grow slowly at 38 °C and thus is an acceptable 

Jjj!: 

; 15 host. TG-1 is also an acceptable host although it is 

N= RecA + and EndA*. XLl-Blue MRF 1 is more preferred for 

the intermediate host used to accumulate diversity 

i W 

y; prior to final construction of the library. 

Q After display and/or expression, the 

20 libraries of this invention may be screened using well 
known and conventionally used techniques. The selected 
peptides, polypeptides or proteins may then be used to 
treat disease. Generally, the peptides, polypeptides 
or proteins for use in therapy or in pharmaceutical 
25 compositions are produced by isolating the DNA encoding 
the desired peptide, polypeptide or protein from the 
member of the library selected. That DNA is then used 
in conventional methods to produce the peptide, 
polypeptides or protein it encodes in appropriate host 
30 cells, preferably mammalian host cells, e.g., CHO 

cells. After isolation, the peptide, polypeptide or 
protein is used alone or with pharmaceutically 
acceptable compositions in therapy to treat disease. 
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EXAMPLES 

Example 1: RACE amplification of heavy and light chain 
antibody repertoires from autoimmune patients. 



Total RNA was isolated from individual blood 
samples (50 ml) of 11 patients using a RNAzolTM kit 
(CINNA/Biotecx) , as described by the • manufacturer . The 
patients were diagnosed as follows: 



1. SLE and phospholipid syndrome 

2. limited systemic sclerosis 



4* 

m 

g'i 10 3. SLE and Sjogren syndrome 



4. Limited Systemic sclerosis 

5. Reumatoid Arthritis with active vasculitis 

•5 

M 5 6. Limited systemic sclerosis and Sjogren Syndrome 



If! 7. Reumatoid Artritis and (not active) vasculitis 



fj s j 15 8. SLE and Sjogren syndrome 



0 9. SLE 



10. SLE and (active) glomerulonephritis 

11. Polyarthritis/ Raynauds Phenomen 



From these 11 samples of total RNA, Poly-A+ RNA was 
20 isolated using Promega PolyATtract® mRNA Isolation kit 
(Promega) . 

250 ng of each poly-A+ RNA sample was used to 
amplify antibody heavy and light chains with the 
GeneRAacerTM kit (Invitrogen cat no. L1500-01) . A 
25 schematic overview of the RACE procedure is shown in 
FIG. 3. 

Using the general protocol of the GeneRAacer™ 
kit, an RNA adaptor was ligated to the 5 'end of all 
mRNAs. Next, a reverse transcriptase reaction was 
30 performed in the presence of oligo(dT15) specific 



primer under conditions described by the manufacturer 
in the GeneRAacer™ kit. 

1/5 of the cDNA from the reverse 
transcriptase reaction was used in a 20 ul PCR 
reaction. For amplification of the heavy chain IgM 
repertoire, a forward primer based on the CHI chain of 
IgM [HuCmFOR] and a backward primer based on the 
ligated synthetic adaptor sequence [5 ? A] were used. 
(See Table 22) 

For amplification of the kappa and lambda 
light chains, a forward primer that contains the 3 f 
coding-end of the cDNA [HuCkFor and HuCLFor2+HuCLfor7] 
and a backward primer based on the ligated synthetic 
adapter sequence [5 ? A] was used (See Table 22). 
Specific amplification products after 30 cycles of 
primary PCR were obtained. 

FIG. 4 shows the amplification products 
obtained after the primary PCR reaction from 4 
different patient samples. 8 ul primary PCR product 
from 4 different patients was analyzed on a agarose gel 
[labeled 1,2, 3 and 4]. For the heavy chain, a product 
of approximately 950 nt is obtained while for the kappa 
and lambda light chains the product is approximately 
850 nt. Ml-2 are molecular weight markers. 

PCR products were also analyzed by DNA 
sequencing [10 clones from the lambda, kappa or heavy 
chain repertoires] . All sequenced antibody genes 
recovered contained the full coding sequence as well as 
the 5 T leader sequence and the V gene diversity was the 
expected diversity (compared to literature data) . 

50 ng of all samples from all 11 individual 
amplified samples were mixed for heavy, lambda light or 
kappa light chains and used in secondary PCR reactions. 

In all secondary PCRs approximately 1 ng 
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template DNA from the primary PCR mixture was used in 
multiple 50 ul PCR reactions [25 cycles] . 

For the heavy chain, a nested biotinylated 
forward primer [HuCm-Nested] was used, and a nested 
5 'end backward primer located in the synthetic 
adapter-sequence [5 f NA] was used. The 5 'end 
lower-strand of the heavy chain was biotinylated. 

For the light chains, a 5 f end biotinylated 
nested primer in the synthetic adapter was used [5 ? NA] 
in combination with a 3 'end primer in the constant 
region of Ckappa and Clambda, extended with a sequence 
coding for the AscI restriction site [ kappa: 
HuCkForAscI, Lambda: HuCL2-F0R-ASC + HuCL7-FOR-ASC] . 
[5'end Top strand DNA was biotinylated] . After 
gel-analysis the secondary PCR products were pooled and 
purified with Promega Wizzard PCR cleanup. 
Approximately 25 ug biotinylated heavy chain, lambda 
and kappa light chain DNA was isolated from the 11 
patients . 

N 

20 Example 2: Capturing kappa chains with BsmAI . 

A repertoire of human-kappa chain mRNAs was 
prepared using the RACE method of Example 1 from a 
collection of patients having various autoimmune 
25 diseases. 

This Example followed the protocol of Example 
1. Approximately 2 micrograms (ug) of human kappa- 
chain (Igkappa) gene RACE material with biotin attached 
to 5 '-end of upper strand was immobilized as in Example 
30 1 on 200 microliters (uL) of Seradyn magnetic beads. 
The lower strand was removed by washing the DNA with 2 
aliquots 200 uL of 0.1 M NaOH (pH 13) for 3 minutes for 
the first aliquot followed by 30 seconds for the second 
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aliquot. The beads were neutralized with 200 \iL of 10 
mM Tris (pH 7.5) 100 mM NaCl. The short 
oligonucleotides shown in Table 23 were added in 40 
fold molar excess in 100 \iL of NEB buffer 2 (50 mM 
NaCl, 10 mM Tris-HCl, 10 mM MgCl 2 , 1 mM dithiothreitol 
pH 7.9) to the dry beads. The mixture was incubated at 
95°C for 5 minutes then cooled down to 55°C over 30 
minutes. Excess oligonucleotide was washed away with 2 
washes of NEB buffer 3 (100 mM NaCl, 50 mM Tris-HCl, 10 
mM MgCl 2f 1 mM dithiothreitol pH 7.9) . Ten units of 
BsmAI (NEB) were added in NEB buffer 3 and incubated 
for 1 h at 55°C. The cleaved downstream DNA was 
collected and purified over a Qiagen PCR purification 
column (FIGs. 5 and 6) . 

FIG. 5 shows an analysis of digested kappa 
single-stranded DNA. Approximately 151.5 pmol of 
.adapter was annealed to 3.79 pmol of immobilized kappa 
single-stranded DNA followed by digestion with 15 U of 
BsmAI. The supernatant containing the desired DNA was 
removed and analyzed by 5% polyacrylamide gel along 
with the remaining beads which contained uncleaved full 
length kappa DNA. 189 pmol of cleaved single-stranded 
DNA was purified for further analysis. Five percent of 
the original full length ssDNA remained on the beads. 

FIG. 6 shows an analysis of the extender - 
cleaved kappa ligation. 180 pmol of pre-annealed 
bridge/extender was ligated to 1.8 pmol of BsmAI 
digested single-stranded DNA. The ligated DNA was 
purified by Qiagen PCR purification column and analyzed 
on a 5% polyacrylamide gel. Results indicated that the 
ligation of extender to single-stranded DNA was 95% 
efficient . 

A partially double-stranded adaptor was 
prepared using the oligonucleotide shown in Table 23. 
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The adaptor was added to the single-stranded DNA in 100 
fold molar excess along with 1000 units of T4 DNA 
ligase and incubated overnight at 16°C. The excess 
oligonucleotide was removed with a Qiagen PCR 
5 purification column. The ligated material was 

amplified by PCR using the primers kapPCRtl and kapfor 
shown in Table 23 for 10 cycles with the program shown 
in Table 24. 

The soluble PCR product was run on a gel and 
^ 10 showed a band of approximately 700 n, as expected 

Q (FIGs. 7 and 8) . The DNA was cleaved with enzymes 

ApaLI and AscI, gel purified, and ligated to similarly 
n s ; cleaved vector pCESl. 

Nj FIG. 7 shows an analysis of the PCR product 

*' 15 from the extender- kappa amplification. Ligated 

M. extender-kappa single-stranded DNA was amplified with 

Pi 

primers specific to the extender and to the constant 
jfj region of the light chain. Two different template 

concentrations, 10 ng versus 50 ng, were used as 
20 template and 13 cycles were used to generate 

approximately 1.5 ug of dsDNA as shown by 0.8% agarose 
gel analysis. 

FIG. 8 shows an analysis of the purified PCR 
product from the extender-kappa amplification. 
25 Approximately 5 ug of PCR amplified extender-kappa 

double-stranded DNA was run out on a 0.8% agarose gel, 
cut out, and extracted with a GFX gel purification 
column. By gel analysis, 3.5 ug of double-stranded DNA 
was prepared. 

30 The assay for capturing kappa chains with 

BsmAl was repeated and produced similar results. 
FIG 9A shows the DNA after it was cleaved and collected 
and purified over a Qiagen PCR purification column. 
FIG. 9B shows the partially double-stranded adaptor 



- 49 - 



ff! 



ligated to the single-stranded DNA. This ligated 
material was then amplified (FIG. 9C) . The gel showed 
a band of approximately 700 n. 

Table 25 shows the DNA sequence of a kappa 
light chain captured by this procedure. Table 2 6 shows 
a second sequence captured by this procedure. The 
closest bridge sequence was complementary to the 
sequence 5 1 -agccacc-3 1 , but the sequence captured reads 
5 1 -Tgccacc-3 1 , showing that some mismatch in the 



Q 10 overlapped region is tolerated 



Example 3: Construction of Synthetic CDR1 and CDR2 
Diversity in V-3-23 VH Framework. 



Synthetic diversity in Complementary 
Determinant Region (CDR) 1 and 2 was created in the 3- 



ffj 15 23 VH framework in a two step process: first, a vector 

¥j containing the 3-23 VH framework was constructed; and 

then, a synthetic CDR 1 and 2 was assembled and cloned 
into this vector. 

For construction of the 3-23 VH framework, 8 
20 oligonucleotides and two PCR primers (long 

oligonucleotides - topfria, botfrib, BOTFR2, botfr3, F06, 
BOTFR4 , ON-vgCl, and ON-vgC2 and primers - SFPRMET and 
botpcrprim, shown in Table 27) that overlap were 
designed based on the Genebank sequence of 3-23 VH 
25 framework region. The design incorporated at least one 
useful restriction site in each framework region, as 
shown in Table 27. In Table 27, the segments that were 
synthesized are shown as bold, the overlapping regions 
are underscored, and the PCR priming regions at each 
30 end are underscored. 
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A mixture of these 8 oligos was combined at a 
final concentration of 2.5uM in a 20ul PCR reaction. 
The PCR mixture contained 200uM dNTPs, 2 . 5mM MgCl 2/ 
0.02U Pfu Turbo™ DNA Polymerase, 1U Qiagen HotStart Taq 
DNA Polymerase, and IX Qiagen PCR buffer. The PCR 
program consisted of 10 cycles of 94°C for 30s, 55°C 
for 30s, and 72°C for 30s. 

The assembled 3-23 VH DNA sequence was then 
amplified, using 2.5ul of a 10-fold dilution from the 
initial PCR in lOOul PCR reaction. The PCR reaction 
contained 200uM dNTPs, 2 . 5mM MgCl 2 , 0.02U Pfu Turbo™ 
DNA Polymerase, 1U Qiagen HotStart Taq DNA Polymerase, 
IX Qiagen PCR Buffer and 2 outside primers (SFPRMET and 
BOTPCRPRIM) at a concentration of luM. The PCR program 
consisted of 23 cycles at 94°C for 30s, 55°C for 30s, 
and 72 °C for 60s. The 3-23 VH DNA sequence was 
digested and cloned into pCESl (phagemid vector) using 
the Sfil and JBstEII restriction endonuclease sites. 
All restriction enzymes mentioned herein were supplied 
by New England BioLabs, Beverly, MA and used as per the 
manufacturer' s instructions . 

Stuffer sequences (shown in Table 28 and 
Table 29) were introduced into pCESl to replace 
CDR1/CDR2 sequences (900 bases between BspEI and Xbal 
RE sites) and CDR3 sequences (358 bases between A fill 
and BstEII) prior to cloning the CDR1/CDR2 diversity. 
This new vector was termed pCES5 and its sequence is 
given in Table 29. 

Having stuffers in place of the CDRs avoids 
the risk that a parental sequence would be over- 
represented in the library. The stuffer sequences are 
fragments from the penicillase gene of E. coli. The 
CDR1-2 stuffer contains restriction sites for BglH, 
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Bsu36I, Bell, Xcml, Mlul, PvuII, Hpal, and Hindi, the 
underscored sites being unique within the vector pCES5. 
The stuffer that replaces CDR3 contains the unique 
restriction endonuclease site RsrII . 

A schematic representation of the design for 
CDR1 and CDR2 synthetic diversity is shown FIG, 10. 
The design was based on the presence of mutations in 
DP47/3-23 and related germline genes. Diversity was 
designed to be introduced at the positions within CDR1 
and CDR2 indicated by the numbers in FIG. 10. The 
diversity at each position was chosen to be one of the 
three following schemes: 1 = ADE FGH I KLMNPQRS TVWY ; 2 = 
YRWVGS; 3 = PS, in which letters encode equimolar mixes 
of the indicated amino acids. 

For the construction of the CDR1 and CDR2 
diversity, 4 overlapping oligonucleotides (ON-vgCl, 
ON_Brl2, ON_CD2Xba / and ON-vgC2, shown in Table 27 and 
Table 30) encoding CDR1/2, plus flanking regions, were 
designed. A mixture of these 4 oligos was combined at 
a final concentration of 2.5uM in a 40ul PCR reaction. 
Two of the 4 oligos contained variegated sequences 
positioned at the CDR1 and the CDR2 . The PCR mixture 
contained 200uM dNTPs, 2 . 5U Pwo DNA Polymerase (Roche), 
and IX Pwo PCR buffer with 2mM MgS0 4 . The PCR program 
consisted of 10 cycles at 94°C for 30s, 60°C for 30s, 
and 72 °C for 60s. This assembled CDR1/2 DNA sequence 
was amplified, using 2.5ul of the mixture in lOOul PCR 
reaction. The PCR reaction contained 200uM dNTPs, 2 . 5U 
Pwo DNA Polymerase, IX Pwo PCR Buffer with 2mM MgS0 4 and 
2 outside primers at a concentration of luM. The PCR 
program consisted of 10 cycles at 94°C for 30s, 60°C 
for 30s, and 72°C for 60s, These variegated sequences 
were digested and cloned into the 3-23 VH framework in 
place of the CDR1/2 stuffer. 
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We obtained approximately 7 X 10 7 independent 
transformants . CDR3 diversity either from donor 
populations or from synthetic DNA can be cloned into 
the vector containing synthetic CDR1 and CDR 2 
diversity. 

A schematic representation of this procedure 
is shown in FIG. 11. A sequence encoding the FR- 
regions of the human V3-23 gene segment and CDR regions 
with synthetic diversity was made by oligonucleotide 
assembly and cloning via BspEl and Xbal sites into a 
vector that complements the FR1 and FR3 regions. Into 
this library of synthetic VH segments, the 
complementary VH-CDR3 sequence (top right) was cloned 
via Xbal an BstEll sites. The resulting cloned CH 
genes contain a combination of designed synthetic 
diversity and natural diversity (see FIG. 11) . 

Example 4 : Cleavage and ligation of the lambda light 
chains with Hinf I . 

A schematic of the cleavage and ligation of 
antibody light chains is shown in FIGs. 12A and 12B. 
Approximately 2 ug of biotinylated human Lambda DNA 
prepared as described in Example 1 was immobilized on 
200 ul Seradyn magnetic beads. The lower strand was 
removed by incubation of the DNA with 200 ul of 0.1 M 
NaOH (pH=13) for 3 minutes, the supernatant was removed 
and an additional washing of 30 seconds with 200 ul of 
0.1 M NaOH was performed. Supernatant was removed and 
the beads were neutralized with 200 ul of 10 mM Tris 
(pH=7.5), 100 mM NaCl . 2 additional washes with 200 ul 
NEB2 buffer 2, containing 10 mM Tris (pH=7.9), 50 mM 
NaCl, 10 mM MgC12 and 1 mM dithiothreitol, were 
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performed. After immobilization, the amount of ssDNA 
was estimated on a 5% PAGE-UREA gel. 

About 0.8 ug ssDNA was recovered and 
incubated in 100 ul NEB2 buffer 2 containing 80 molar 
fold excess of an equimolar mix of ON_LamlaB7, 
ON_Lam2aB7, ON_Lam31B7 and ON__Lam3rB7 [each oligo in 
20 fold molar excess] (see Table 31) . 

The mixture was incubated at 95° C for 5 
minutes and then slowly cooled down to 50° C over a 
period of 30 minutes. Excess of oligonucleotide was 
washed away with 2 washes of 200 ul of NEB buffer 2. 
4 U/ug of Hinf I was added and incubated for 1 hour at 
50° C. Beads were mixed every 10 minutes. 

After incubation the sample was purified over 
a Qiagen PCR purification column and was subsequently 
analysed on a 5% PAGE-urea gel (see FIG. 13A, cleavage 
was more than 70% efficient) . 

A schematic of the ligation of the cleaved 
light chains is shown in FIG. 12B. A mix of 
bridge/extender pairs was prepared from the Brg/Ext 
oligo 1 s listed in Table 31 (total molar excess 100 
fold) in 1000 U of T4 DNA Ligase (NEB) and incubated 
overnight at 16° C. After ligation of the DNA, the 
excess oligonucleotide was removed with a Qiagen PCR 
purification column and ligation was checked on a 
Urea-PAGE gel (see FIG. 13B; ligation was more than 95% 
efficient) . 

Multiple PCRs were performed containing 10 ng 
of the ligated material in an 50 ul PCR reaction using 
25 pMol ON lamPlePCR and 25 pmol of an equimolar mix 
of Hu-CL2AscI/HuCL7AscI primer (see Example 1) . 

PCR was performed at 60° C for 15 cycles 
using Pfu polymerase. About 1 ug of dsDNA was recovered 
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per PCR (see FIG. 13C) and cleaved with ApaLl and AscI 
for cloning the lambda light chains in pCES2 . 

Example 5: Capture of human heavy-chain CDR3 
population. 

A schematic of the cleavage and ligation of 
antibody light chains is shown in FIGs. 14A and 14B. 

Approximately 3 ug of human heavy-chain (IgM) 
gene RACE material with biotin attached to 5 f -end of 
lower strand was immobilized on 300 uL of Seradyn 
magnetic beads. The upper strand was removed by 
washing the DNA with 2 aliquots 300 uL of 0.1 M NaOH 
(pH 13) for 3 minutes for the first aliquot followed by 
30 seconds for the second aliquot. The beads were 
neutralized with 300 uL of 10 mM Tris (pH 7.5) 100 mM 
NaCl. The REdaptors (oligonucleotides used to make 
single-stranded DNA locally double-stranded) shown in 
Table 32 were added in 30 fold molar excess in 200 uL 
of NEB buffer 4 (50 mM Potasium Acetate, 20 mM 
Tris-Acetate, 10 mM Magnesuim Acetate, 1 mM 
dithiothreitol pH 7.9) to the dry beads. The 
REadaptors were incubated with the single-stranded DNA 
at 80 °C for 5 minutes then cooled down to 55 °C over 
30 minutes. Excess REdaptors were washed away with 2 
washes of NEB buffer 4. Fifteen units of HpyCH4III 
(NEB) were added in NEB buffer 4 and' incubated for 1 
hour at 55 °C. The cleaved downstream DNA remaining on 
the beads was removed from the beads using a Qiagen 
Nucleotide removal column (see FIG. 15) . 

The Bridge/Extender pairs shown in Table 33 
were added in 25 molar excess along with 1200 units of 
T4 DNA ligase and incubated overnight at 16 °C. Excess 
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Bridge/Extender was removed with a Qiagen PCR 
purification column. The ligated material was 
amplified by PCR using primers H43 .XAExtPCR2 and 
Hucumnest shown in Table 34 for 10 cycles with the 
program shown in Table 35, 

The soluble PCR product was run on a gel and 
showed a band of approximately 500 n, as expected (see 
FIG . 15B) . The DNA was cleaved with enzymes Sfil and 
NotI, gel purified, and ligated to similarly cleaved 
vector PCES1. 

Example 6: Description of Phage Display Vector CJRA05, 
a member of the library built in vector DY3F7. 

Table 36 contains an annotated DNA sequence 
of a member of the library, CJRA05, see FIG, 16. Table 
36 is to be read as follows: on each line everything 
that follows an exclamation mark " ! 11 is a comment. All 
occurrences of A, C, G, and T before " ! 11 are the DNA 
sequence. Case is used only to show that certain bases 
constitute special features, such as restriction sites, 
ribosome binding sites, and the like, which are labeled 
below the DNA. CJRA05 is a derivative of phage DY3F7, 
obtained by cloning an ApaLI to NotI fragment into 
these sites in DY3F31. DY3F31 is like DY3F7 except 
that the light chain and heavy chain genes have been 
replaced by "stuffer" DNA that does not code for any 
antibody. DY3F7 contains an antibody that binds 
streptavidin, but did not come from the present 
library. 

The phage genes start with gene ii and 
continue with genes x, v, vii, ix, viii, iii, vi, i, 
and iv. Gene iii has been slightly modified in that 



eight codons have been inserted between the signal 
sequence and the mature protein and the final amino 
acids of the signal sequence have been altered. This 
allows restriction enzyme recognition sites EagI and 
Xbal to be present. Following gene iv is the phage 
origin of replication (ori) . After ori is bla which 
confers resistance to ampicillin (ApR) . The phage 
genes and bla are transcribed in the same sense. 

After bla, is the Fab cassette (illustrated 
in FIG. 17) comprising: 

a) PlacZ promoter, 

b) A first Ribosome Binding Site (RBS1), 

c) The signal sequence form M13 iii, 

d) An ApaLI RERS, 

e) A light chain (a kappa L20::JK1 shortened by one 
codon at the V-J boundary in this case) , 

f ) An AscI RERS, 

g) A second Ribosome Binding Site (RBS2), 

h) A signal sequence, preferably PelB, which 
contains, 

i) An Sfil RERS, 

j) A synthetic 3-23 V region with diversity in CDR1 

and CDR2, 
k) A captured CDR3, 

1) A partially synthetic J region (FR4 after BstEII), 
m) CHI, 

n) A NotI RERS, 
o) A His6 tag, 
p) A cMyc tag, 
q) An amber codon, 

r) An anchor DNA that encodes the same amino-acid 
sequence as codons 273 to 424 of M13 iii (as shown in 
Table 37) . 
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s) Two stop codons, 
t) An Avrll RERS, and 
u) A trp terminator. 

The anchor (item r) encodes the same 
amino-acid sequence as do codons 273 to 424 of M13 iii 
but the DNA is approximately as different as possible 
from the wild-type DNA sequence. In Table 36, the 
III 1 stump runs from base 8997 to base 9455. Below the 
DNA, as comments, are the differences with wild-type 
iii for the comparable codons with "!W.T" at the ends 
of these lines. Note that Met and Trp have only a 
single codon and must be left as is. These AA types 
are rare. Ser codons can be changed at all three base, 
while Leu and Arg codons can be changed at two. 

In most cases, one base change can be 
introduced per codon. This has three advantages: 1) 
recombination with the wild-type gene carried elsewhere 
on the phage is less likely, 2) new restriction sites 
can be introduced, facilitating construction; and 3) 
sequencing primers that bind in only one of the two 
regions can be designed. 

The fragment of M13 III shown in CJRA05 is 
the preferred length for the anchor segment. 
Alternative longer or shorter anchor segments defined 
by reference to whole mature III protein may also be 
utilized. 

The sequence of M13 III consists of the 
following elements: Signal Sequence :: Domain 1 
(Dl)::Linker 1 (LI) :: Domain 2 (D2)::Linker 2 
(L2) : : Domain 3 (D3) : : Transmembrane Segment (TM) : : 
Intracellular anchor (IC) (see Table 38) . 

The pill anchor (also known as trpIII) 
preferably consists of D2 : : L2 : : D3 : :.TM: : IC . Another 
embodiment for the pill anchor consists of 
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D2 ' : : L2 : : D3 : : TM : : IC (where D2 1 comprises the last 21 
residues of D2 with the first 109 residues deleted) . A 
further embodiment of the pill anchor consists of 
D2 f (OS) : :L2: :D3: :TM: : IC (where D2 f (C>S) is D2 1 with 
the single C converted to S), and d) D3::TM::IC. 

Table 38 shows a gene fragment comprising the 
NotI site, His6 tag, cMyc tag, an amber codon, a 
recombinant enterokinase cleavage site, and the whole 
of mature M13 III protein. The DNA used to encode this 
sequence is intentionally very different from the DNA 
of wild-type gene iii as shown by the lines denoted 
"W.T." containing the w.t. bases where these differ 
from this gene. Ill is divided into domains denoted 
"domain 1", "linker 1", "domain 2", "linker 2", "domain 
3", "transmembrane segment", and "intracellular 
anchor" . 

Alternative preferred anchor segments 
(defined by reference to the sequence of Table 38) 
include : 

codons 1-29 joined to codons 104-435, deleting 
domain 1 and retaining linker 1 to the end; 

codons 1-38 joined to codons 104-435, deleting 
domain land retaining the rEK cleavage site plus linker 
1 to the end from III; 

codons 1-29 joined to codons 236-435, deleting 
domain 1, linker 1, and most of domain 2 and retaining 
linker 2 to the end; 

codons 1-38 joined to codons 236-435, deleting 
domain 1, linker 1, and most of domain 2 and retaining 
linker 2 to the end and the rEK cleavage site; 

codons 1-29 joined to codons 236-435 and changing 
codon 240 to Ser(e.g., age), deleting domain 1, linker 
1, and most of domain 2 and retaining linker 2 to the 
end; and 
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codons 1-38 joined to codons 236-435 and changing 
codon 240 to Ser(e.g., age), deleting domain 1, linker 
1, and most of domain 2 and retaining linker 2 to the 
end and the rEK cleavage site. 
5 The constructs would most readily be made by 

methods similar to those of Wang and Wilkinson 
( Biotechniaues 2001: 31(4)722-724) in which PCR is used 
to copy the vector except the part to be deleted and 
matching restriction sites are introduced or retained 
10 at either end of the part to be kept. Table 39 shows 
the oligonucleotides to be used in deleting parts of 



4 S: the III anchor segment. The DNA shown in Table 38 has 

m 

rf; an Nhel site before the DINDDRMA recombinant 

Si enterokinase cleavage site (rEKCS) . If Nhel is used in 
; ,: 15 the deletion process with this DNA, the rEKCS site 

N: would be lost. This site could be quite useful in 

y cleaving Fabs from the phage and might facilitate 

ypi capture of very high-af f f inity antibodies. One could 

Cj mutagenize this sequence so that the Nhel site would 



20 follow the rEKCS site, an Ala Ser amino-acid sequence 
is already present. Alternatively, one could use SphI 
for the deletions. This would involve a slight change 
in amino acid sequence but would be of no consequence. 



Example 7 : Selection of antigen binders from an 
25 enriched library of human antibodies using phage vector 
DY3F31. 



In this example the human antibody library 
used is described in de Haard et al., ( Journal of 
Biological Chemistry , 274 (26): 18218-30 (1999). This 
30 library, consisting of a large non-immune human Fab 

phagemid library, was first enriched on antigen, either 
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on streptavidin or on phenyl-oxazolone (phOx) . The 
methods for this are well known in the art. Two 
preselected Fab libraries, the first one selected once 
on immobilized phOx-BSA (Rl-ox) and the second one 
selected twice on streptavidin (R2-strep) , were chosen 
for recloning. 

These enriched repertoires of phage 
antibodies, in which only a very low percentage have 
binding activity to the antigen used in selection, were 
confirmed by screening clones in an ELISA for antigen 
binding. The selected Fab genes were transferred from 
the phagemid vector of this library to the DY3F31 
vector via ApaLl-Notl restriction sites. 

DNA from the DY3F31 phage vector was 
pretreated with ATP dependent DNAse to remove 
chromosomal DNA and then digested with ApaLl and Notl . 
An extra digestion with AscI was performed in between 
to prevent self-ligation of the vector. The ApaLl/NotI 
Fab fragment from the preselected libraries was 
subsequently ligated to the vector DNA and transformed 
into competent XLl-blue MRF T cells. 

Libraries were made using vector : insert 
ratios of 1:2 for phOx-library and 1:3 for STREP 
library, and using 100 ng ligated DNA per 50 yl of 
electroporation-competent cells (electroporation 
conditions : one shock of 1700 V, 1 hour recovery of 
cells in rich SOC medium, plating on amplicillin- 
containing agar plates) . 

This transformation resulted in a library 
size of 1.6 x 10 6 for Rl-ox in DY3F31 and 2.1 x 10 6 for 
R2-strep in DY3F31. Sixteen colonies from each library 
were screened for insert, and all showed the correct 
size insert (±1400 bp) (for both libraries) . 
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Phage was prepared from these Fab libraries 
as follows. A representative sample of the library was 
inoculated in medium with ampicillin and glucose, and 
at OD 0.5, the medium exchanged for ampicillin and 1 mM 
IPTG. After overnight growth at 37 °C, phage was 
harvested from the supernatant by PEG-NaCl 
precipitation. Phage was used for selection on antigen. 
Rl-ox was selected on phOx-BSA coated by passive 
adsorption onto immunotubes and R2-strep on 
streptavidin coated paramagnetic beads (Dynal, Norway), 
in procedures described in de Haard et. al. and Marks 
et. al., Journal of Molecular Biology , 222(3): 581-97 
(1991). Phage titers and enrichments are given in 
Table 40. 

Clones from these selected libraries, dubbed 
R2-ox and R3-strep respectively, were screened for 
binding to their antigens in ELISA. 44 clones from 
each selection were picked randomly and screened as 
phage or soluble Fab for binding in ELISA. For the 
libraries in DY3F31, clones were first grown in 2TY-2% 
glucose-50 ug/ml AMP to an OD600 of approximately 0.5, 
and then grown overnight in 2TY-50 ug/ml AMP + /- ImM 
IPTG. Induction with IPTG may result in the production 
of both phage-Fab and soluble Fab. Therefore the 
(same) clones were also grown without IPTG. Table 41 
shows the results of an ELISA screening of the 
resulting supernatant, either for the detection of 
phage particles with antigen binding (Anti-M13 HRP = 
anti-phage antibody) , or for the detection of human 
Fabs, be it on phage or as soluble fragments, either 
with using the anti-myc antibody 9E10 which detects the 
myc-tag that every Fab carries at the C-terminal end of 
the heavy chain followed by a HRP-labeled 
rabbit-anti-Mouse serum (column 9E10/RAM-HRP) , or with 
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anti-light chain reagent followed by a HRP-labeled 
goat-anti-rabbit antiserum (anti-CK/CL Gar-HRP) . 

The results shows that in both cases 
antigen-binders are identified in the library, with as 
Fabs on phage or with the anti-Fab reagents (Table 41) . 
IPTG induction yields an increase in the number of 
positives. Also.it can be seen that for the 
phOx-clones, the phage ELISA yields more positives than 
the soluble Fab ELISA, most likely due to the avid 
binding of phage. Twenty four of the ELISA-positive 
clones were screened using PCR of the Fab-insert from 
the vector, followed by digestion with BstNI. This 
yielded 17 different patterns for the phOx-binding 
Fab's in 23 samples that were correctly analyzed, and 6 
out of 24 for the streptavidin binding clones. Thus, 
the data from the selection and screening from this 
pre-enriched non-immune Fab library show that the 
DY3F31 vector is suitable for display and selection of 
Fab fragments, and provides both soluble Fab and Fab on 
phage for screening experiments after selection. 

Example 8: Selection of Phage-antibody libraries on 
streptavidin magnetic beads. 

The following example describes a selection 
in which one first depletes a sample of the library of 
binders to streptavidin and optionally of binders to a 
non-target (i.e., a molecule other than the target that 
one does not want the selected Fab to bind) . It is 
hypothesized that one has a molecule, termed a 
"competitive ligand", which binds the target and that 
an antibody which binds at the same site would be 
especially useful. 
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For this procedure Streptavidin Magnetic 
Beads (Dynal) were blocked once with blocking solution 
(2% Marvel Milk, PBS (pH 7.4), 0.01% Tween-20 
("2%MPBST")) for 60 minutes at room temperature and 
then washed five times with 2%MPBST. 450 uL of beads 
were blocked for each depletion and subsequent 
selection set. 

Per selection, 6.25 \iL of biotinylated 
depletion target (1 mg/mL stock in PBST) was added to 
0.250 mL of washed, blocked beads (from step 1) . The 
target was allowed to bind overnight, with tumbling, at 
4°C. The next day, the beads are washed 5 times with 
PBST. 

Per selection, 0.010 mL of biotinylated 
target antigen (1 mg/mL stock in PBST) was added to 
0.100 mL of blocked and washed beads (from step 1). 
The antigen was allowed to bind overnight, with 
tumbling, at 4°C. The next day, the beads were washed 
5 times with PBST. 

In round 1, 2 X 10 12 up to 10 13 plaque forming 
units (pfu) per selection were blocked against 
non-specific binding by adding to 0.500 mL of 2%MPBS 
(=2%MPBST without Tween) for 1 hr at RT (tumble) . In 
later rounds, 1011 pfu per selection were blocked as 
done in round 1 . 

Each phage pool was incubated with 50 uL of 
depletion target beads (final wash supernatant removed 
just before use) on a Labquake rotator for 10 min at 
room temperature. After incubation, the phage 
supernatant was removed and incubated with another 50 
UL of depletion target beads. This was repeated 3 more 
times using depletion target beads and twice using 
blocked streptavidin beads for a total of 7 rounds of 
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depletion, so each phage pool required 350 \iL of 
depletion beads, 

A small sample of each depleted library pool 
was taken for titering. Each library pool was added to 
0.100 mL of target beads (final wash supernatant was 
removed just before use) and allowed to incubate for 2 
hours at room temperature (tumble) . 

Beads were then washed as rapidly as possible 
(e.g., 3 minutes total) with 5 X 0.500 mL PBST and then 
2X with PBS. Phage still bound to beads after the 
washing were eluted once with 0.250 mL of competitive 
ligand (-1 \i]M) in PBST for 1 hour at room temperature 
on a Labquake rotator. The eluate was removed, mixed 
with 0.500 mL Minimal A salts solution and saved. For 
a second selection, 0.500 mL 100 mM TEA was used for 
elution for 10 min at RT, then neutralized in a mix of 
0.250 mL of 1 M Tris, pH 7.4 + 0.500 mL Min A salts. 

After the first selection elution, the beads 
can be eluted again with 0.300 mL of non-biotinylated 
target (1 mg/mL) for 1 hr at RT on a Labquake rotator. 
Eluted phage are added to 0.450 mL Minimal A salts. 

Three eluates (competitor from 1st selection, 
target from 1st selection and neutralized TEA elution 
from 2nd selection) were kept separate and a small 
aliquot taken from each for titering. 0.500 mL Minimal 
A salts were added to the remaining bead aliquots after 
competitor and target elution and after TEA elution. 
Take a small aliquot from each was taken for tittering. 

Each elution and each set of eluted beads was 
mixed with 2X YT and an aliquot (e.g., 1 mL with 1. E 
10/mL) of XLl-Blue MRF 1 E. coli cells (or other F f cell 
line) which had been chilled on ice after having been 
grown to mid-logarithmic phase, starved and 
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concentrated (see procedure below - "Mid-Log prep of 
XL-1 blue MRF' cells for infection") . 

After approximately 30 minutes at room 
temperature, the phage/cell mixtures were spread onto 
Bio-Assay Dishes (243 X 243 X 18 mm, Nalge Nunc) 
containing 2XYT, ImM IPTG agar. The plates were 
incubated overnight at 30°C. The next day, each 
amplified phage culture was harvested from its 
respective plate. The plate was flooded with 35 mL TBS 
or LB, and cells were scraped from the plate. The 
resuspended cells were transferred to a centrifuge 
bottle. An additional 20 mL TBS or LB was used to 
remove any cells from the plate and pooled with the 
cells in the centrifuge bottle. The cells were 
centrifuged out, and phage in the supernatant was 
recovered by PEG precipitation. Over the next day, the 
amplified phage preps were titered. 

In the first round, two selections yielded 
five amplified eluates. These amplified eluates were 
panned for 2-3 more additional rounds of selection 
using -1. E 12 input phage/round. For each additional 
round, the depletion and target beads were prepared the 
night before the round was initiated. 

For the elution steps in subsequent rounds, 
all elutions up to the elution step from which the 
amplified elution came from were done, and 
the previous elutions were treated as washes. For the 
bead infection amplified phage, for example, the 
competitive ligand and target elutions were done and 
then tossed as washes (see below) . Then the beads were 
used to infect E. coli. Two pools, therefore, yielded 
a total of 5 final elutions at the end of the 
selection. 
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1st selection set 

A. Ligand amplified elution: elute w/ ligand 
for 1 hr, keep as elution 

5 B. Target amplified elution: elute w/ ligand 

for 1 hr, toss as wash elute w/ target for 1 
hr, keep as elution 

C. Bead infect, amp. elution: elute w/ 
ligand for 1 hr, toss as wash elute w/ target 
10 for 1 hr, toss as wash elute w/ cell 

infection, keep as elution 
2nd selection set 

A. TEA amplified elution; elute w/ TEA 
lOmin, keep as elution 

15 B. Bead infect, amp. elution; elute w/ 

TEA lOmin, toss as wash elute w/ cell 
infection, keep as elution 

Mid-log prep of XL1 blue MRF ' cells for infection 

(based on Barbas et al. Phage Display manual procedure) 

20 Culture XL1 blue MRF 1 in NZCYM (12.5 mg/mL 

tet) at 37°C and 250 rpm overnight. Started a 500 mL 
culture in 2 liter flask by diluting cells 1/50 in 
NZCYM/tet (10 mL overnight culture added) and incubated 
at 37°C at 250 rpm until OD600 of 0.45 (1.5-2 hrs) was 

25 reached. Shaking was reduced to 100 rpm for 10 min. 
When OD600 reached between 0.55-0.65, cells were 
transferred to 2 x 250 mL centrifuge bottles, 
centrifuged at 600 g for 15 min at 4°C. Supernatant 
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was poured off. Residual liquid was removed with a 
pipette . 

The pellets were gently resuspended (not 
pipetting up and down) in the original volume of 1 X 
Minimal A salts at room temp. The resuspended cells 
were transferred back into 2-liter flask, shaken at 100 
rpm for 45 min at 37 °c. This process was performed in 
order to starve the cells and restore pili. The cells 
were transferred to 2 x 250 mL centrifuge bottles, and 
centrifuged as earlier. 

The cells were gently resuspended in ice cold 
Minimal A salts (5 mL per 500 mL original culture) . 
The cells were put on ice for use in infections as soon 
as possible. 

The phage eluates were brought up to 7.5 mL 
with 2XYT medium and 2.5 mL of cells were added. Beads 
were brought up to 3 mL with 2XYT and 1 mL of cells 
were added. Incubated at 37oC for 30 min. The cells 
were plated on 2XYT, 1 mM IPTG agar large NUNC plates 
and incubated for 18 hr at 30°C. 

Example 9: Incorporation of synthetic region in FR1/3 
region. 

Described below are examples for 
incorporating of fixed residues in antibody sequences 
for light chain kappa and lambda genes, and for heavy 
chains. The experimental conditions and 
oligonucleotides used for the examples below have been 
described in previous examples (e.g., Examples 3 & 4) . 

The process for incorporating fixed FR1 
residues in an antibody lambda sequence consists of 3 
steps (see FIG. 18) : (1) annealing of single-stranded 
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DNA material encoding VL genes to a partially 
complementary oligonucleotide mix (indicated with Ext 
and Bridge) , to anneal in this example to the region 
encoding residues 5-7 of the FR1 of the lambda genes 
(indicated with X..X; within the lambda genes the 
overlap may sometimes not be perfect) ; (2) ligation of 
this complex; (3) PCR of the ligated material with the 
indicated primer ('PCRpr') and for example one primer 
based within the VL gene. In this process the first few 
residues of all lambda genes will be encoded by the 
sequences present in the oligonucleotides (Ext., Bridge 
or PCRpr) . After the PCR, the lambda genes can be 
cloned using the indicated restriction site for ApaLI . 

The process for incorporating fixed FR1 
residues in an antibody kappa sequence (FIG. 19) 
consists of 3 steps : (1) annealing of single-stranded 
DNA material encoding VK genes to a partially 
complementary oligonucleotide mix (indicated with Ext 
and Bri), to anneal in this example to the region 
encoding residues 8-10 of the FR1 of the kappa genes 
(indicated with X. .X; within the kappa genes the 
overlap may sometimes not be perfect) ; (2) ligation of 
this complex; (3) PCR of the ligated material with the 
indicated primer ( f PCRpr 1 ) and for example one primer 
based within the VK gene. In this process the first few 
(8) residues of all kappa genes will be encode by the 
sequences present in the oligonucleotides (Ext., Bridge 
or PCRpr.). After the PCR, the kappa genes can be 
cloned using the indicated restriction site for ApaLI. 

The process of incorporating fixed FR3 
residues in a antibody heavy chain sequence (FIG. 20) 
consists of 3 steps : (1) annealing of single-stranded 
DNA material encoding part of the VH genes (for example 
encoding FR3, CDR3 and FR4 regions) to a partially 
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complementary oligonucleotide mix (indicated with Ext 
and Bridge), to anneal in this example to the region 
encoding residues 92-94 (within the FR3 region) of VH 
genes (indicated with X..X; within the VH genes the 
overlap may sometimes not be perfect); (2) ligation of 
this complex; (3) PCR of the ligated material with the 
indicated primer ('PCRpr') and for example one primer 
based within the VH gene (such as in the FR4 region) . 
In this process certain residues of all VH genes will 
be encoded by the sequences present in the 
oligonucleotides used here, in particular from PCRpr 
(for residues 70-73), or from Ext/Bridge 
oligonucleotides (residues 74-91) . After the PCR, the 
partial VH genes can be cloned using the indicated 
restriction site for Xbal. 

It will be understood that the foregoing is 
only illustrative of the principles of this invention 
and that various modifications can be made by those 
skilled in the art without departing from the scope of 
and sprit of the invention. 
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Table 1: Human GLG FR3 sequences 
! VH1 

! 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 
agg gtc acc atg acc agg gac acg tec ate age aca gee tac atg 
5 ! 81 82 82a 82b 82c 83 84 85 86 87 88 89 90 91 92 
gag ctg age agg ctg aga tct gac gac acg gee gtg tat tac tgt 
! 93 94 95 

gcg aga ga ! l-02# 1 

aga gtc acc att acc agg gac aca tec gcg age aca gee tac atg 
10 gag ctg age age ctg aga tct gaa gac acg get gtg tat tac tgt 

gcg aga ga ! l-03# 2 

aga gtc acc atg acc agg aac acc tec ata age aca gee tac atg 
gag ctg age age ctg aga tct gag gac acg gee gtg tat tac tgt 
gcg aga gg ! l-08# 3 
15 aga gtc acc atg acc aca gac aca tec acg age aca gee tac atg 

i?\ gag ctg agg age ctg aga tct gac gac acg gee gtg tat tac tgt 

yl gcg aga ga ! 1-18# 4 

■<*! aga gtc acc atg acc gag gac aca tct aca gac aca gec tac atg 

gag ctg age age ctg aga tct gag gac acg gee gtg tat tac tgt 
20 gca aca ga ! 1-24 # 5 

aga gtc acc att acc agg gac agg tct atg age aca gee tac atg 
fy gag ctg age age ctg aga tct gag gac aca gee atg tat tac tgt 

iJl gca aga ta ! l-45# 6 

M aga gtc acc atg acc agg gac acg tec acg age aca gtc tac atg 

^' 25 gag ctg age age ctg aga tct gag gac acg gee gtg tat tac tgt 

gcg aga ga ! l-46# 7 

aga gtc acc att acc agg gac atg tec aca age aca gec tac atg 
gag ctg age age ctg aga tec gag gac acg gee gtg tat tac tgt 
gcg gca ga ! 1-58 # 8 
30 aga gtc acg att acc gcg gac gaa tec acg age aca gec tac atg 

gag ctg age age ctg aga tct gag gac acg gee gtg tat tac tgt 
gcg aga ga ! l-69# 9 

aga gtc acg att acc gcg gac aaa tec acg age aca gee tac atg 
gag ctg age age ctg aga tct gag gac acg gee gtg tat tac tgt 
35 gcg aga ga ! l-e# 10. 

aga gtc acc ata acc gcg gac acg tct aca gac aca gee tac atg 
gag ctg age age ctg aga tct gag gac acg gee gtg tat tac tgt 
gca aca ga ! l-f# 11 
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ate acc aag 
aac atg gac 
c! 2-05# 12 
ate tec aag 
aac atg gac 
c! 2-26# 13 
ate tec aag 
aac atg gac 
c! 2-70# 14 

ate tec aga 
age ctg aga 

! 3-07# 15 
ate tec aga 
agt ctg aga 
a! 3-09#16 
ate tec agg 
age ctg aga 

! 3-ll# 17 
ate tec aga 
age ctg aga 

! 3-13# 18 
ate tea aga 
age ctg aaa 

! 3-15# 19 
ate tec aga 
agt ctg aga 

! 3-20# 20 
ate teg aga 
age ctg aga 

! 3-21# 21 
ate tec aga 
age ctg aga 

! 3-23# 22 
ate tec aga 
age ctg aga 

! 3-30# 23 
ate tec aga 
age ctg aga 



gac acc tec 
cct gtg gac 

gac acc tec 
cct gtg gac 

gac acc tec 
cct gtg gac 

gac aac gee 
gee gag gac 

gac aac gee 
get gag gac 

gac aac gee 
gee gag gac 

gaa aat gee 
gee ggg gac 

gat gat tea 
acc gag gac 

gac aac gee 
gee gag gac 

gac aac gee 
gee gag gac 

gac aat tec 
gee gag gac 

gac aat tec 
get gag gac 

gac aat tec 
get gag gac 



aaa aac cag 
aca gee aca 

aaa age cag 
aca gee aca 

aaa aac cag 
aca gee acg 

aag aac tea 
acg get gtg 

aag aac tec 
acg gee ttg 

aag aac tea 
acg gee gtg 

aag aac tec 
acg get gtg 

aaa aac acg 
aca gee gtg 

aag aac tec 
acg gee ttg 

aag aac tea 
acg get gtg 

aag aac acg 
acg gee gta 

aag aac acg 
acg get gtg 

aag aac acg 
acg get gtg 



gtg gtc ctt 
tat tac tgt 

gtg gtc ctt 
tat tac tgt 

gtg gtc ctt 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ttg tat ctt 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat cac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 



gcg 


aga 


ga 


! 3303# 24 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aaa 


ga 


! 3305# 25 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


gee 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3-33# 26 




















cga 


ttc 


acc 


ate tec aga 


gac 


aac 


age 


aaa 


aac 


tec 


ctg 


tat 


ctg 


caa 


atg 


aac 


agt ctg aga 


act 


gag 


gac 


acc 


gee 


ttg 


tat 


tac 


tgt 


gca 


aaa 


gat 


a! 3-43#27 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


gee 


aag 


aac 


tea 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


gac 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga ! 


! 3-48# 28 




















aga 


ttc 


acc 


ate tea aga 


gat 


ggt 


tec 


aaa 


age 


ate 


gec 


tat 


ctg 


caa 


atg 


aac 


age ctg aaa 


acc 


gag 


gac 


aca 


gee 


gtg 


tat 


tac 


tgt 


act 


aga 


ga ! 


! 3-49# 29 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctt 


caa 


atg 


aac 


age ctg aga 


gee 


gag 


gac 


acg 


gee 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga ! 


3-53# 30 




















aga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctt 


caa 


atg 


ggc 


age ctg aga 


get gag 


gac 


atg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga ! 


3-64# 31 




















aga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctt 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga ! 


3-66# 32 




















aga 


ttc 


acc 


ate tea aga 


gat 


gat 


tea 


aag 


aac 


\tca 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aaa 


acc 


gag 


gac 


acg 


gee 


gtg 


tat 


tac 


tgt 


get 


aga 


ga ! 


3-72# 33 




















agg 


ttc 


acc 


ate tec aga 


gat 


gat 


tea 


aag 


aac 


acg 


gcg 


tat 


ctg 


caa 


atg 


aac 


age ctg aaa 


acc 


gag 


gac 


acg 


gee 


gtg 


tat 


tac 


tgt 


act 


aga 


ca ! 


! 3-73# 34 




















cga 


ttc 


acc 


ate tec aga 


gac 


aac 


gee 


aag 


aac 


acg 


ctg 


tat 


ctq 


caa 


atg 


aac 


agt ctg aga 


gee 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gca 


aga 


ga ! 


! 3-74# 35 




















aga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


cat 


ctt 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


aag 


aaa 


ga ! 


! 3-d# 36 




















/H4 


























cga 


gtc 


acc 


ata tea gta 


gac 


aag 


tec 


aag 


aac 


cag 


ttc 


tec 


ctg 


aag 


ctg 


age 


tct gtg acc 


gee 


gcg 


gac 


acg 


gee 


gtg 


tat 


tac 


tgt 
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gcg aga ga ! 4-04# 37 
cga gtc acc atg tea gta gac 
aag ctg age tct gtg acc gec 
gcg aga aa ! 4-28# 38 
cga gtt acc ata tea gta gac 
aag ctg age tct gtg act gee 
gcg aga ga ! 4301# 39 
cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc gee 
gee aga ga ! 4302# 40 
cga gtt acc ata tea gta gac 
aag ctg age tct gtg act gee 
gee aga ga ! 4304# 41 
cga gtt acc ata tea gta gac 
aag ctg age tct gtg act gee 
gcg aga ga ! 4-31# 42 
cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc gee 
gcg aga ga ! 4-34# 43 
cga gtc acc ata tec gta gac 
aag ctg age tct gtg acc gee 
gcg aga ca ! 4-39# 44 
cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc get 
gcg aga ga ! 4-59# 45 
cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc get 
gcg aga ga ! 4-61# 46 
cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc gee 
gcg aga ga ! 4-b# 47 
VH5 

cag gtc acc ate tea gee gac 
cag tgg age age ctg aag gee 
gcg aga ca ! 5-51# 48 
cac gtc acc ate tea get gac 
cag tgg age age ctg aag gee 
gcg aga ! 5-a# 49 
VH6 

cga ata acc ate aac cca gac 



acg tec aag aac cag ttc tec ctg 
gtg gac acg gee gtg tat tac tgt 

acg tct aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

agg tec aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gca gac acg gee gtg tat tac tgt 

acg tct aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gcg gac acg get gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gca gac acg get gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gca gac acg gee gtg tat tac tgt 

aag tec ate age acc gee tac ctg 
teg gac acc gee atg tat tac tgt 

aag tec ate age act gee tac ctg 
teg gac acc gee atg tat tac tgt 

aca tec aag aac cag ttc tec ctg 



cag ctg aac tct gtg act ccc gag gac acg get gtg tat tac tgt 
gca aga ga ! 6-l# 50 
VH7 

egg ttt gtc ttc tec ttg gac acc tct gtc age acg gca tat ctg 
cag ate tgc age eta aag get gag gac act gee gtg tat tac tgt 
gcg aga ga ! 74. 1# 51 
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Table 2: Enzymes that either cut 15 or more human GLGs or have 5+-base recognition in FR3 
Typical entry: 

REname Recognition #sites 

GLGid# : base# GLGid# : base# GLGid# : base# 

5 

BstEII Ggtnacc 2 
1: 3 48: 3 
There are 2 hits at base# 3 



10 Maelll gtnac 36 







1: 


4 


2: 


4 


3: 


4 


4: 


4 


5: 


4 


6: 


4 






7: 


4 


8: 


4 


9: 


4 


10: 


4 


11: 


4 


37: 


4 






37: 


58 


38: 


4 


38: 


58 


39: 


4 


39: 


58 


40: 


4 


0 




40: 


58 


41: 


4 


41: 


58 


42: 


4 


42: 


58 


43: 


4 




15 


43: 


58 


44: 


4 


44: 


58 


45: 


4 


45: 


58 


46: 


4 


m 




46: 


58 


47: 


4 


47: 


58 


48: 


4 


49: 


4 


50: 


58 


m 




There < 


are 24 


hits 


at 


base! 


4 








































s 




Tsp45I gtsac 










33 












20 


1: 


4 


2: 


4 


3: 


4 


4: 


4 


5: 


4 


6: 


4 


0 




7: 


4 


8: 


4 


9: 


4 


10: 


4 


11: 


4 


37: 


4 


ru 




37: 


58 


38: 


4 


38: 


58 


39: 


58 


40: 


4 


40: 


58 


.-=23. 




41: 


58 


42: 


58 


43: 


4 


43: 


58 


44: 


4 


44: 


58 




25 


45: 
48: 


4 
4 


45: 
49: 


58 
4 


46: 
50: 


4 
58 


46: 


58 


47: 


4 


47: 


58 



There are 21 hits at base# 4 





HphI 


tcacc 










45 












1: 


5 


2: 


5 


3: 


5 


4: 


• 5 


5: 


5 


6: 


5 


30 


7: 


5 


8: 


5 


11: 


5 


12: 


5 


12: 


11 


13: 


5 




14: 


5 


15: 


5 


16: 


5 


17: 


5 


18: 


5 


19: 


5 




20: 


5 


21: 


5 


22: 


5 


23: 


5 


24: 


5 


25: 


5 




26: 


5 


27: 


5 


28: 


5 


29 


5 


30: 


5 


31: 


5 




32: 


5 


33: 


5 


34: 


5 


35 


5 


36: 


5 


37: 


5 


35 


38: 


5 


40: 


5 


43: 


5 


44 


5 


45: 


5 


46: 


5 




47: 


5 


48: 


5 


49: 


5 















There are 44 hits at base# 5 
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Nlalll CATG 2 6 



1: 9 


1: 42 


2: 


42 


3: 


9 


3: 


42 


4: 


9 


4: 42 


5: 9 


5: 


42 


6: 


42 


6: 


78 


7: 


9 


7: 42 


8: 21 


8: 


42 


9: 


42 


10: 


42 


11: 


42 


12: 57 


13: 48 


13: 


57 


14: 


57 


31: 


72 


38: 


9 


48: 78 


49: 78 


















There are 


11 hits 


at 


base# 


42 













There are 1 hits at base# 48 Could cause raggedness 



BsaJI Ccnngg 37 



1: 


14 


2: 


14 


5: 


14 


6: 


14 


7: 


14 


8: 


14 


8: 


65 


9: 


14 


10: 


14 


11: 


14 


12: 


14 


13: 


14 


14: 


14 


15: 


65 


17: 


14 


17: 


65 


18: 


65 


19: 


65 


20: 


65 


21: 


65 


22: 


65 


26: 


65 


29: 


65 


30: 


65 


33: 


65 


34: 


65 


35: 


65 


37: 


65 


38: 


65 


39: 


65 


40: 


65 


42: 


65 


43: 


65 


48: 


65 


49: 


65 


50: 


65 



51: 14 

There are 23 hits at base# 65 

There are 14 hits at base# 14 



Alul AGct 



1: 


47 


2: 


47 


7: 


47 


8: 


47 


23: 


63 


24: 


63 


37: 


47 


37: 


52 


40: 


47 


40: 


52 


43: 


47 


43: 


52 


46: 


47 


46: 


52 



42 



3: 


47 


4: 


47 


9: 


47 


10: 


47 


25: 


63 


31: 


63 


38: 


47 


38: 


52 


41: 


47 


41: 


52 


44: 


47 


44: 


52 


47: 


47 


47: 


52 



5: 


47 


6: 


47 


11: 


47 


16: 


63 


32: 


63 


36: 


63 


39: 


47 


39: 


52 


42: 


47 


42: 


52 


45: 


47 


45: 


52 


49: 


15 


50: 


47 



5 bases from 47 



There are 23 hits at base# 47 

There are 11 hits at base# 52 Only 



BlpI GCtnagc 
1: 48 2: 48 
8: 48 9: 48 
39: 48 40: 48 
45: 48 46: 48 
There are 21 h 



21 

3: 48 5: 48 
10: 48 11: 48 
41: 48 42: 48 
47: 48 

at base# 48 



6: 48 7: 48 
37: 48 38: 48 
43: 48 44: 48 
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Mwol GCNNNNNnngc 19 



1: 


48 


2: 


28 


19: 


36 


22: 


36 


23: 


36 


24: 


36 


25: 


36 


26: 


36 


35: 


36 


37: 


67 


39: 


67 


40: 


67 


41: 


67 


42: 


67 


43: 


67 


44: 


67 


45: 


67 


46: 


67 



47: 67 

There are 10 hits at base# 67 
There are 7 hits at base# 36 



Ddel Ctnag 71 



1: 


49 


1: 


58 


2: 


49 


2: 


58 


3: 


49 


3: 


58 


3: 


65 


4: 


49 


4: 


58 


5: 


49 


5: 


58 


5: 


65 


6: 


49 


6: 


58 


6: 


65 


7: 


49 


7: 


58 


7: 


65 


8: 


49 


8: 


58 


9: 


49 


9: 


58 


9: 


65 


10: 


49 


10: 


58 


10: 


65 


11: 


49 


11: 


58 


11: 


65 


15: 


58 


16: 


58 


16: 


65 


17: 


58 


18: 


58 


20: 


58 


21: 


58 


22: 


58 


23: 


58 


23: 


65 


24: 


58 


24: 


65 


25: 


58 


25: 


65 


26: 


58 


27: 


58 


27: 


65 


28: 


58 


30: 


58 


31: 


58 


31: 


65 


32: 


58 


32: 


65 


35: 


58 


36: 


58 


36: 


65 


37: 


49 


38: 


49 


39: 


26 


39: 


49 


40: 


49 


41: 


49 


42: 


26 


42: 


49 


43: 


49 


44: 


49 


45: 


49 


46: 


49 


47: 


49 


48: 


12 


49: 


12 


51: 


65 







There are 29 hits at base# 58 

There are 22 hits at base# 49 Only nine base from 58 
There are 16 hits at base# 65 Only seven bases from 5 



Bglll Agatct 11 

1: 61 2: 61 3: 61 4: 61 5: 61 6: 61 
7: 61 9: 61 10: 61 11: 61 51: 47 
There are 10 hits at base# 61 

BstYI Rgatcy 12 

1: 61 2: 61 3: 61 4: 61 5: 61 6: 61 
7: 61 8: 61 9: 61 10: 61 11: 61 51: 47 
There are 11 hits at base# 61 
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1: 


72 


2: 


72 


3: 


72 


4: 


72 


5: 


72 


6: 


72 


7: 


72 


8: 


72 


9: 


72 


10: 


72 


11: 


72 


15: 


72 


17: 


72 


18: 


72 


19: 


72 


21: 


72 


23: 


72 


24: 


72 


25: 


72 


26: 


72 


28: 


72 


29: 


72 


30: 


72 


31: 


72 


32: 


72 


33: 


72 


34: 


72 


35: 


72 


36: 


72 


37: 


72 


38: 


72 


39: 


72 


40: 


72 


41: 


72 


42: 


72 


43: 


72 


44: 


72 


45: 


72 


46: 


72 


47: 


72 


48: 


72 


49: 


72 


50: 


72 


51: 


72 



















Hpyl88I TCNga 17 

1: 64 2: 64 3: 64 4: 64 5: 64 6: 64 
7: 64 8: 64 9: 64 10: 64 11: 64 16: 57 
20: 57 27: 57 35: 57 48: 67 49: 67 
5 There are 11 hits at base# 64 
There are 4 hits at base# 57 

There are 2 hits at base# 67 Could be ragged. 
MslI CAYNNnnRTG 44 

10 

bap) 

==?" : 

m 

si 

f There are 44 hits at base# 72 

0 

flj 20 BsiEI CGRYcg 23 

W 1: 74 3: 74 4: 74 5: 74 7: 74 8: 74 

^ 9: 74 10: 74 11: 74 17: 74 22: 74 30: 74 

33: 74 34: 74 37: 74 38: 74 39: 74 40: 74 

41: 74 42: 74 45: 74 46: 74 47: 74 

25 There are 23 hits at base# 74 

Eael Yggccr 23 

1: 74 3: 74 4: 74 5: 74 7: 74 8: 74 

9: 74 10: 74 11: 74 17: 74 22: 74 30: 74 

30 33: 74 34: 74 37: 74 38: 74 39: 74 40: 74 

41: 74 42: 74 45: 74 46: 74 47: 74 
There are 23 hits at base# 74 

EagI Cggccg 23 

35 3 : 74 3: 74 4: 74 5: 74 7: 74 8: 74 

9: 74 10: 74 11: 74 17: 74 22: 74 30: 74 

33: 74 34: 74 37: 74 38: 74 39: 74 40: 74 
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41: 74 42: 74 45: 74 46: 74 47: 74 
There are 23 hits at basef 74 

Haelll GGcc 27 

5 1: 75 3: 75 4: 75 5: 75 7: 75 8: 75 

9: 75 10: 75 11: 75 16: 75 17: 75 20: 75 

22: 75 30: 75 33: 75 34: 75 37: 75 38: 75 

39: 75 40: 75 41: 75 42: 75 45: 75 46: 75 

47: 75 48: 63 49: 63 

10 There are 25 hits at base# 75 

Bst4CI ACNgt 65 °C 63 Sites There is a third isoschismer 



ui 

\i is 

a 

s 

m 20 

o 



25 



30 



35 



1: 


86 


2: 


86 


3: 


86 


4: 


86 


5: 


86 


6: 


86 


7: 


34 


7: 


86 


8: 


86 


9: 


86 


10: 


86 


11: 


86 


12: 


86 


13: 


86 


14: 


86 


15: 


36 


15: 


86 


16: 


53 


16: 


86 


17: 


36 


17: 


86 


18: 


86 


19: 


86 


20: 


53 


20: 


86 


21: 


36 


21: 


86 


22: 


0 


22: 


86 


23: 


86 


24: 


86 


25: 


86 


26: 


86 


27: 


53 


27: 


86 


28: 


36 


28: 


86 


29: 


86 


30: 


86 


31: 


86 


32: 


86 


33: 


36 


33: 


86 


34: 


86 


35: 


53 


35: 


86 


36: 


86 


37: 


86 


38: 


86 


39: 


86 


40: 


86 


41: 


86 


42: 


86 


43: 


86 


44: 


86 


45: 


86 


46: 


86 


47: 


86 


48: 


86 


49: 


86 


50: 


86 


51: 


0 


51: 


86 














There ar< 


2 51 hits at base# 86 


All 


the 


other 


sites c 


HpyCH4III 


ACNgt 








63 










1: 


86 


2: 


86 


3: 


86 


4: 


86 


5: 


86 


6: 


86 


7: 


34 


7: 


86 


8: 


86 


9: 


86 


10: 


86 


11: 


86 


12: 


86 


13: 


86 


14: 


86 


15: 


36 


15: 


86 


16: 


53 


16: 


86 


17: 


36 


17: 


86 


18: 


86 


19: 


86 


20: 


53 


20: 


86 


21: 


36 


21: 


86 


22: 


0 


22: 


86 


23: 


86 


24: 


86 


25: 


86 


26: 


86 


27: 


53 


27: 


86 


28: 


36 


28: 


86 


29: 


86 


30: 


86 


31: 


86 


32: 


86 


33: 


36 


33: 


86 


34: 


86 


35: 


53 


35: 


86 


36: 


86 


37: 


86 


38: 


86 


39: 


86 


40: 


86 


41: 


86 


42: 


86 


43: 


86 


44: 


86 


45: 


86 


46: 


86 


47: 


86 


48: 


86 


49: 


86 


50: 


86 


51: 


0 


51: 


86 
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There are 51 hits at base# 86 



Hinfl Gantc 



43 



w 

=»:: 
==?■ 

if'! 

~ : 

%! 

2 

h* 

ny 
m 
p 



10 



15 



25 



30 



35 



2: 


2 


3: 


2 


4: 


2 


5: 


2 


6: 


2 


7: 


2 


8: 


2 


9: 


2 


9: 


22 


10: 


2 


11: 


2 


15: 


2 


16: 


2 


17: 


2 


18: 


2 


19: 


2 


19: 


22 


20: 


2 


21: 


2 


23: 


2 


24: 


2 


25: 


2 


26: 


2 


27: 


2 


28: 


2 


29: 


2 


30: 


2 


31: 


2 


32: 


2 


33: 


2 


33: 


22 


34: 


22 


35: 


2 


36: 


2 


37: 


2 


38: 


2 


40: 


2 


43: 


2 


44: 


2 


45: 


2 


46: 


2 


47: 


2 


50: 


60 























There are 38 hits at base# 2 

Mlyl GAGTCNNNNNn 18 



There are 18 hits at base# 2 



20 Plel gagtc 



18 



There are 18 hits at base# 2 

Acil Ccgc 24 
2: 26 9: 14 10: 14 11: 14 
37: 65 38: 62 39: 65 40: 62 
43 



27: 74 
40: 65 



42: 65 

46: 62 47j 
There are 
There are 

There are 
There are 
There are 
There are 
-"- Gcgg 
8: 91 9 



62 
62 



43: 65 44: 62 44: 65 



47: 65 



48: 35 48: 74 



8 hits at base# 62 
8 hits at base# 65 

3 hits at base# 14 
3 hits at base# 74 
1 hits at base# 26 
1 hits at base# 35 

11 

16 10: 16 11: 16 



2: 


2 


3: 


2 


4: 


2 


5: 


2 


6: 


2 


7: 


2 


8: 


2 


9: 


2 


10: 


2 


11: 


2 


37: 


2 


38: 


2 


40: 


2 


43: 


2 


44: 


2 


45: 


2 


46: 


2 


47: 


2 



2: 


2 


3: 


2 


4: 


2 


5: 


2 


6: 


2 


7: 


2 


8: 


2 


9: 


2 


10: 


2 


11: 


2 


37: 


2 


38: 


2 


40: 


2 


43: 


2 


44: 


2 


45: 


2 


46: 


2 


47: 


2 



37: 62 



41: 
45: 
49: 



65 
62 
74 



37: 67 39: 67 
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40: 67 42: 67 43: 67 45: 67 46: 67 
There are 7 hits at basef 67 

There are 3 hits at base# 16 
There are 1 hits at base# 91 

5 

BsiHKAI GWGCWc 20 

2: 30 4: 30 6: 30 7: 30 9: 30 10: 30 
12: 89 13: 89 14: 89 37: 51 38: 51 39: 51 
40: 51 41: 51 42: 51 43: 51 44: 51 45: 51 
10 46: 51 47: 51 
N : There are 11 hits at base# 51 



Bspl286I GDGCHc 20 

2: 30 4: 30 6: 30 7: 30 9: 30 10: 30 

y i 

ffl 15 12: 89 13: 89 14: 89 37: 51 38: 51 39: 51 

\! 40: 51 41: 51 42: 51 43: 51 44: 51 45: 51 

M 46: 51 47: 51 

s There are 11 hits at base# 51 

M 

Pi 

j^j 20 HgiAI GWGCWc 20 

if; 2: 30 4: 30 6: 30 7: 30 9: 30 10: 30 

p 12: 89 13: 89 14: 89 37: 51 38: 51 39: 51 

H : 40: 51 41: 51 42: 51 43: 51 44: 51 45: 51 

46: 51 47: 51 

25 There are 11 hits at base# 51 



BsoFI GCngc 26 



2: 


53 


3: 


53 


5: 


53 


6: 


53 


7: 


53 


8: 


8: 


91 


9: 


53 


10: 


53 


11: 


53 


31: 


53 


36: 


37: 


64 


39: 


64 


40: 


64 


41: 


64 


42: 


64 


43: 


44: 


64 


45: 


64 


46: 


64 


47: 


64 


48: 


53 


49: 


50: 


45 


51: 


53 

















There are 13 hits at base# 53 
There are 10 hits at base# 64 



35 



Tsel Gcwgc 17 
2: 53 3: 53 5: 53 6: 53 
9: 53 10: 53 11: 53 31: 53 



7: 53 
36: 36 



8: 53 
45: 64 
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q 

W 



b 



10 



15 



20 



46: 64 


48: 


53 


49: 


53 


50: 


45 


51: 


53 






There are 13 hits at 


base# 53 












Mnll gagg 












34 










3: 67 


3: 


95 


4: 


51 


5: 


16 


5: 


67 


6: 


67 


7: 67 


8: 


67 


9: 


67 


10: 


67 


11: 


67 


15: 


67 


16: 67 


17: 


67 


19: 


67 


20: 


67 


21: 


67 


22: 


67 


23: 67 


24: 


67 


25: 


67 


26: 


67 


27: 


67 


28: 


67 


29: 67 


30: 


67 


31: 


67 


32: 


67 


33: 


67 


34: 


67 


35: 67 


36: 


67 


50: 


67 


51: 


67 











There are 31 hits at base# 67 



HpyCH4V TGca 34 



5: 


90 


6: 


90 


11: 


90 


12: 


90 


13: 


90 


14: 


90 


15: 


44 


16: 


44 


16: 


90 


17: 


44 


18: 


90 


19: 


44 


20: 


44 


21: 


44 


22: 


44 


23: 


44 


24: 


44 


25: 


44 


26: 


44 


27: 


44 


27: 


90 


28: 


44 


29: 


44 


33: 


44 


34: 


44 


35: 


44 


35: 


90 


36: 


38 


48: 


44 


49: 


44 


50: 


44 


50: 


90 


51: 


44 


51: 


52 











There are 21 hits at base# 44 

There are 1 hits at base# 52 



AccI GTmkac 

7: 37 11: 24 37: 16 
25 41: 16 42: 16 43: 16 
47: 16 

There are 11 hits at base# 16 



13 5-base recognition 
38: 16 39: 16 40: 16 
44: 16 45: 16 46: 16 



SacII CCGCgg 
30 9: 14 10: 14 11: 14 37: 

42: 65 43: 65 

There are 5 hits at base# 65 
There are 3 hits at base# 14 



8 6-base recognition 
65 39: 65 40: 65 



35 



Tfil Gawtc 
9: 22 15: 
19: 22 20: 



16: 
21: 



24 
17: 2 
23: 2 



18: 
24: 



19: 
25: 
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26: 2 27: 2 28: 2 29: 2 30: 2 31: 2 
32: 2 33: 2 33: 22 34: 22 35: 2 36: 2 
There are 20 hits at base# 2 

5 BsmAI Nnnnnngagac 19 

15: 11 16: 11 20: 11 21: 11 22: 11 23: 11 

24: 11 25: 11 26: 11 27: 11 28: 11 28: 56 

30: 11 31: 11 32: 11 35: 11 36: 11 44: 87 
48: 87 

10 There are 16 hits at base# 11 

U 

W Bpml ctccag 19 

15: 12 16: 12 17: 12 18: 12 20: 12 21: 12 

22: 12 23: 12 24: 12 25: 12 26: 12 27: 12 

w 

m 15 28: 12 30: 12 31: 12 32: 12 34: 12 35: 12 

SJ 36: 12 



0 



w 
m 
o 



There are 19 hits at base# 12 



XmnI GAANNnnttc 12 



20 37: 30 38: 30 39: 30 40: 30 41: 30 42: 30 



43: 30 44: 30 45: 30 46: 30 47: 30 50: 30 

There are 12 hits at base# 30 

BsrI NCcagt 12 

25 37: 32 38: 32 39: 32 40: 32 41: 32 42: 32 

43: 32 44: 32 45: 32 46: 32 47: 32 50: 32 

There are 12 hits at base# 32 

Banll GRGCYc 11 

30 37: 51 38: 51 39: 51 40: 51 41: 51 42: 51 

43: 51 44: 51 45: 51 46: 51 47: 51 

There are 11 hits at base# 51 

Ecll36I GAGctc 11 

35 37: 51 38: 51 39: 51 40: 51 41: 51 42: 51 

43: 51 44: 51 45: 51 46: 51 47: 51 
There are 11 hits at base# 51 
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SacI GAGCTc 11 

37: 51 38: 51 39: 51 40: 51 41: 51 42: 51 

43: 51 44: 51 45: 51 46: 51 47: 51 

There are 11 hits at base# 51 



o 

fear 

m 
m 



~ ! 

in 
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Table 3: Synthetic 3-23 FR3 of human heavy chains showning positions of possible cleavage sites 

! Sites engineered into the synthetic gene are shown in upper case 
DNA 

! with the RE name between vertical bars (as in | Xbal | ) . 
5 ! RERSs frequently found in GLGs are shown below the synthetic 
sequence 

! with the name to the right (as in gtn ac=MaeIII (24 ) , indicating 
that 

! 24 of the 51 GLGs contain the site) . 
10 ! 

! | FR3 

! 89 90 (codon 

# in 

! R F 

15 synthetic 3-23) 

|cgc|ttc| 6 

! Allowed DNA |cgn|ttyj 



i 



I agr I 



! ga ntc 

20 Hinfl(38) 



ga gtc = 
ga wtc = 



Plel (18) 
i 

Tfil (20) 

25 ! gtn ac = 



Maelll (24) 
I 

Tsp45l (21) 



gts ac = 



tc acc = 



30 Hphl(44) 
i 

I FR3 

! 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 

! TISRDNSKNTLYL.QM 
3 5 I act I ate | TCT I AGA| gac | aac | tct I aag I aat I act j etc I tac I ttg I cag | atg | 

51 

! allowed I acn I ath | ten I cgn I gay I aay | ten I aar I aay I acn 1 ttr I tay I ttr I car I atg I 
lagylagrl lagyl jctnl Ictnl 

I ga| gac = BsmAI(16) ag ct = 

40 Alul(23) 
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! cltcc ag = Bpml (19) g ctn age = 

Blpl(21) 

! || g aan nnn ttc = Xmnl(12) 

! I Xbal | tg ca = 

HpyCH4V(21) 
i 

! FR3 >| 

! 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 

! NSLRAEDTAVYYCAK 

I aac | agC I TTA I AGg I get I gag I gac | aCT I GCA I Gtc I tac | tat I tgc I get I aaa I 96 
! allowed I aay | tcn| ttr | cgnl genj gar | gay I acn| gen I gtn| tay I tay | tgy I gen I aar I 
! |agy|ctn|agr I I I 

! | | cc nng g = BsaJI(23) ac ngt = Bst4CI (51) 

! I aga tct = Bglll(lO) I ac ngt = 

HpyCH4lII (51) 

! | Rga tcY = BstYI(ll) | ac ngt = Taal (51) 

! || c ayn nnn rtc = Msll(44) 

! || eg rye g = BsiEI (23) 

! || yg gee r - Eael (23) 

! || eg gee g = EagI (23) 

! || |g gee = Haelll (25) 

! | | gag g = Mnll (31) | 

! lAflll | | PstI | 
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0 



5K 



B 

in 



10 



15 



20 



25 



30 



35 



Table 4: REdaptors, Extenders, and Bridges used for Cleavage and 

Capture of Human Heavy Chains in FR3 . 

A: HpyCH4V Probes of actual human HC genes 

!HpyCH4V in FR3 of human HC, bases 35-56; only those with TGca site 

TGca; 10, 

RE recognition : tgca of length 4 is expected at 

10 

6-1 agttctccctgcagctgaactc 
3-11, 3-07, 3-21, 3-72, 3-48 cactgtatctgcaaatgaacag 
3-09, 3-43, 3-20 ccctgtatctgcaaatgaacag 
5-51 ccgcctacctgcagtggagcag 
3-15, 3-30, 3-30 .5,3-30 . 3, 3-74,3-23, 3-33 cgctgtatctgcaaatgaacag 

7-4.1 cggcatatctgcagatctgcag 
3-73 cggcgtatctgcaaatgaacag 
5-a ctgcctacctgcagtggagcag 
3-49 tcgcctatctgcaaatgaacag 
B: HpyCH4V REdaptors, Extenders, and Bridges 

B . 1 REdaptors 
! Cutting HC lower strand: 

! TmKeller for 100 mM NaCl, zero formamide 
! Edapters for cleavage 



1 
2 
3 
4 
5 
6 
7 
8 
9 



(ON_HCFR36-l) 

(ONJ1CFR36-1A) 

(0N_HCFR36-1B) 

(ON_HCFR33-15) 

(ON_HCFR33-15A) 

(ON_HCFR33-15B) 

(ON_HCFR33-ll) 

(ON HCFR35-51) 



m W 

5 ' -agttctcccTGCAgctgaactc-3 ' 68.0 

5 1 -ttctcccTGCAgctgaactc-3 ' 62 . 0 

5 1 -ttctcccTGCAgctgaac-3 ' 56 . 0 

5 ' -cgctgtatcTGCAaatgaacag-3 1 64 . 0 

5 1 -ctgtatcTGCAaatgaacag-3 1 56.0 

5 1 -ctgtatcTGCAaatgaac-3 1 50.0 

5 1 -cactgtatcTGCAaatgaacag-3 1 62 . 0 

5 1 -ccgcctaccTGCAgtggagcag-3 ' 74.0 



m K 

64.5 
62.5 
59.9 
60.8 
56.3 
53.1 
58.9 
70.1 



B.2 Segment of synthetic 3-23 gene into which captured CDR3 is to 
be cloned 

Xbal. . . 

D323* cgCttcacTaag tcT aqa gac aaC tcT aag aaT acT ctC taC 
scab designed gene 3-23 gene 



HpyCH4V 



Aflll. 
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Ttg caG atg aac ag e TtA aq G . . . 



B.3 Extender and Bridges 

! Extender (bottom strand) : 
i 

(ON_HCHpyEx01 ) 5 1 -cAAgTAgAgAgTATTcTTAgAgTTgTcTcTAgAcTTAgTgAAgcg-3 1 
! ON_HCHpyEx01 is the reverse complement of 

! 5 1 -cgCttcacTaag tcT aqa gac aaC tcT aag aaT acT ctC taC Ttg -3' 
! Bridges (top strand, 9-base overlap) : 

(ON_HCHpyBr016-l) 5 ' -cgCttcacTaag tcT aqa gac aaC tcT aag- 

aaT acT ctC taC Ttg CAgctgaac-3' {3* -term C is 

blocked) 
I 

! 3-15 et al. + 3-11 

(ON_HCHpyBr023-15) 5 1 -cgCttcacTaag tcT aqa gac aaC tcT aag- 

aaT acT ctC taC Ttg CAaatgaac-3 ' (3 '-term C is 

blocked} 



! 5-51 

(ON_HCHpyBr045-51) 5 1 -cgCttcacTaag tcT aqa gac aaC tcT aag- 

aaT acT ctC taC Ttg CAgtggagc-3' {3 1 -term C is 

blocked) 

! PCR primer (top strand) 

(ON_HCHpyPCR) 5 1 -cgCttcacTaag tcT aqa gac-3' 

I 

C: BlpI Probes from human HC GLGs 

1 1-58, 1-03, 1-08,1-69, 1-24, 1-45, 1-46, 1-f, 1-e 
acatggaGCTGAGCagcctgag 

2 1-02 
acatggaGCTGAGCaggctgag 

3 1-18 
acatggagctgaggagcctgag 



acctgcagtggagcagcctgaa 

5 3-15,3-73,3-49,3-72 
atctgcaaatgaacagcctgaa 

6 3303,3-33,3-07,3-11,3-30,3-21,3-23,3305,3-48 
atctgcaaatgaacagcctgag 

7 3-20,3-74,3-09,3-43 
atctgcaaatgaacagtctgag 

8 74.1 
atctgcagatctgcagcctaaa 

9 3-66, 3-13, 3-53, 3-d 
atcttcaaatgaacagcctgag 

10 3-64 
atcttcaaatgggcagcctgag 

11 4301,4-28, 4302,4-04, 4304,4-31,4-34,4-39, 4-59, 4-61, 4-b 
ccctgaaGCTGAGCtctgtgac 

12 6-1 
ccctgcagctgaactctgtgac 

13 2-70,2-05 
tccttacaatgaccaacatgga 

14 2-26 
tccttaccatgaccaacatgga 

D: BlpI REdaptors, Extenders, and Bridges 
D . 1 REdaptors 

TW rp \r 

m • L m" 

(BlpF3HCl-58) 5 ! -ac atg gaG CTG AGC age ctg ag-3' 70 66. 

4 

(BlpF3HC6-l) 5 f -cc ctg aag ctg age tct gtg ac-3* 70 66. 



BlpF3HC6-l matches 4-30.1, not 6-1. 



4 



D.2 Segment of synthetic 3-23 gene into which captured CDR3 is to 
be cloned 



BlpI 

! Xbal... 

!D323* cgCttcacTaag TCT AG A gac aaC tcT aag aaT acT etc taC Ttg 
caG atg aac 



Aflll. . . 
aq C TTA AG G 



D.3 Extender and Bridges 

! Bridges 

(BlpF3Brl) 5' -cgCttcacTcag tcT aga gaT aaC AGT aaA aaT acT TtG- 

taC Ttg caG Ctg a | GC age ctg-3 1 
(BlpF3Br2) 5 ' -cgCttcacTcag tcT aga gaT aaC AGT aaA aaT acT TtG- 

taC Ttg caG Ctg a | gc tct gtg-3 1 
! | lower strand is cut here 

! Extender 
(BlpF3Ext) 5 ' - 

TcAgcTgcAAgTAcAAAgTATTTTTAcTgTTATcTcTAgAcTgAgTgAAgcg - 3 ' 
! BlpF3Ext is the reverse complement of: 

! 5 1 -cgCttcacTcag tcT aga gaT aaC AGT aaA aaT acT TtG taC Ttg caG 

Ctg a-3' 
I 

(BlpF3PCR) 5' -cgCttcacTcag tcT aga gaT aaC-3 1 



E: fl£yCH4III Distinct GLG sequences surrounding site, bases 77-98 

1 102 81, 1188 4, 14607, 169#9,le#10, 311817, 353830,404 #37 , 4 301 
ccgtgtattactgtgcgagaga 

2 10382, 307 #15, 321821,3303824,333826, 348 028, 364 #31, 366832 
ctgtgtattactgtgcgagaga 

3 10883 
ccgtgtattactgtgcgagagg 

4 124#5,lf011 
ccgtgtattactgtgcaacaga 

5 145#6 
ccatgtattactgtgcaagata 

6 15888 
ccgtgtattactgtgcggcaga 

7 205812 
ccacatattactgtgcacacag 

8 226813 
ccacatattactgtgcacggat 

9 270814 
ccacgtattactgtgcacggat 
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0 

0 

m 

Si 



Q 
111 

b 



10 

ccttgtattactgtgcaaaaga 
11 

ctgtgtattactgtgcaagaga 
5 12 

ccgtgtattactgtaccacaga 
13 

ccttgtatcactgtgcgagaga 
14 

10 ccgtatattactgtgcgaaaga 
15 

ctgtgtattactgtgcgaaaga 
16 

ccgtgtattactgtactagaga 
15 17 

ccgtgtattactgtgctagaga 
18 

ccgtgtattactgtactagaca 
19 

20 ctgtgtattactgtaagaaaga 
20 

ccgtgtattactgtgcgagaaa 
21 

ccgtgtattactgtgccagaga 
25 22 

ctgtgtattactgtgcgagaca 
23 

ccatgtattactgtgcgagaca 
24 

30 ccatgtattactgtgcgaga 



35 



40 



309#16,343#27 
313#18,374#35,61#50 
315#19 
320A20 
323#22 
330023, 3305#25 
349#29 
372#33 
373#34 
3d#36 
428#38 
4302NO, 4304*41 
439#44 
551#48 
5a#49 



F: HpyCH4III REdaptors , Extenders, and Bridges 
F . 1 REdaptors 

ONs for cleavage of HC (lower) in FR3 (bases 77-97) 
For cleavage with HpyCH4III, Bst4CI f or Taal 
cleavage is in lower chain before base 88. 

77 788 888 888 889 999 999 9 

78 901 234 567 890 123 456 7 



rr. K 

(H43.77.97.1-02#l) 

(H43.77.97.1-03#2) 

(H43.77.97.108#3) 

(H43.77.97.323#22) 

(H43.77.97.330#23) 

(H43.77.97.439#44) 



5'-cc gtg tat tAC TGT gcg aga g-3 1 

5'-ci gtg tat tAC TGT gcg aga g-3 1 

5'-cc gtg tat tAC TGT gcg aga g-3 1 

5'-cc gt§j; tat tac tgt gcg a|jja g-3 1 

5'-d§ gtg tat tac tgt gcg ai|a g-3 1 

5 ' -cii gtg tat tac tgt gcg aga jj§-3 1 



m W 

6462.6 
6260.6 
6462.6 
6058.7 
6058.7 
6260.6 
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(H43.77.97.551#48) 5*-cc |tg tat tac tgt gcg aga 1-3 1 6260. < 

(H43.77.97.5a#49) 5*-cc |tg tat tAC TGT gcg aga |-3 f 5858.: 

F . 2 Extender and Bridges 

! Xbal and Aflll sites in bridges are bunged 

(H43.XABrl) 5 ' -ggtgtagtga- 

| TCT | AGt | gac | aac | tct | aag | aat | act | etc | tac | ttg | cag | atg j - 
laacla gC ITTt|AGg|gct lgag|gac| aCT I GCAI Gtc I tac I tat tgt gcg aga-3 

(H43 . XABr2 ) 5 ' -ggtgtagtga- 

I TCT | AGt | gac | aac | tct j aag | aat | act | etc | tac | ttg | cag | atg | - 

I aac 1 a qC I TTt I AGq j get I gag 1 gac I aCT I GCAI Gtc I tac I tat tgt gcg aaa-3 

(H43.XAExt) 5 1 -ATAgTAgAcT gcAgTgTccT cAgcccTTAA gcTgTTcATc 

TgcAAgTAgA- 

gAgTATTcTT AgAgTTgTcT cTAgATcAcT AcAcc-3 1 
!H43.XAExt is the reverse complement of 
! 5 1 -ggtgtagtga- 

! | TCT | AGA | gac | aac | tct | aag | aat | act | etc | tac | ttg 1 cag | atg | - 
! I aac I a gC I TTAl AGq Igct I gag 1 gac I aCT I GCAI Gtc I tac I tat -3' 

(H43.XAPCR) 5'-ggtgtagtga | TCT | AGA | gac | aac-3 1 
! Xbal and Aflll sites in bridges are bunged 
(H43 . ABrl) 5 ' -ggtgtagtga- 

| aac 1 a gC I TTt I AGq | get I gag I gac 1 aCT I GCA I Gtc I tac I tat tgt gcg aga-3 
(H43 . ABr2 ) 5 ' -ggtgtagtga- 

t aac | a gC I TTt I AGq I get 1 gag I gac I aCT \ GCA I Gtc I tac I tat tgt gcg aaa-3 
(H43 . AExt ) 5 1 -ATAgTAgAcTgcAgTgTccTcAgcccTTAAgcTgTTTcAcTAcAcc-3 1 
! (H43.AExt) is the reverse complement of 5 1 -ggtgtagtga- 
! I aac I a gC 1 TTA I AGq I get 1 gag I gac I aCT I GCA I Gtc I tac I tat -3 1 
(H43.APCR) S'-ggtgtagtga I aac I a gC I TTA j AGq | get I q-3 1 
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rH 






CM 


CM 














H 


H 






H 












































o 


00 


<* 


CM 


o 


o 


rH 




rH 


rH 


r- 


LO 


CM 


rH 


at 


H 


CM 


o 


o 


o 




m 


m 








VO 


m 






















CM 


CM 














H 


H 


p 




r» 


ro 




«T 




ro 


C\ 


r- 


ro 


m 


00 


ro 


r- 


LO 




CM 


rH 


CM 




P* 


m 


O 


0 




m 




rH 




rH 


CM 






CM 


ro 


«H 




H 


t» 


rH 








CO 


rH 


p- 




P 


CM 




H 






















H 




































































TJ 


rH 


CM 


ro 




lO 




p» 


CO 




O 


rH 


CM 


ro 




m 


KD 


r- 


00 




o 


rH 


CM 


ro 


M 








































CM 


CM 


CM 


CM 



LO 



LO 



O 
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24 213 26 56 60 42 20 7 2 0 0 204 5a#49 
ccatqtattactqtqcqaqaAA . . a AA 

Group 337 471 363 218 130 58 23 11 6 

Cumulative 337 808 1171 1389 1519 1577 1600 1611 1617 

5 Seqs with the expected RE site only 1511 

Seqs with only an unexpected site 0 

Seqs with both expected and unexpected.... 8 
Seqs with no sites 0 



p 

ra- 
tes' 

m 
m 

Si 

s 

N 
2" 
W 

m 



Table 5D: 



Analysis repeated using only 8 best REdaptors 
Id Ntot 01234567 8 + 

1 301 78 101 54 32 16 9 10 1 0 281 102#1 
ccgtgtattactgtgcgagaga 

2 493 69 155 125 73 37 14 11 3 6 459 103#2 
ctgtgtattactgtgcgagaga 

3 189 52 45 38 23 18 5 4 1 3 176 108#3 
ccgtgtattactgtgcgagagg 

4 127 29 23 28 24 10 6 5 2 0 114 323#22 
ccgtatattactgtgcgaaaga 

5 78 21 25 14 11 1 4 2 0 0 72 330#23 
ctgtgtattactgtgcgaaaga 6 79 15 17 25 8 11 1 2 
439#44 ctgtgtattactgtgcgagaca 

7 43 14 15 5 5 3 0 1 0 0 42 551#48 
ccatgtattactgtgcgagaca 

8 307 26 63 72 51 38 24 14 13 6 250 5a#49 
ccatgtattactgtgcgaga 

1 102#1 ccgtgtattactgtgcgagaga ccgtgtattactgtgcgagaga 



2 103#2 ctgtgtattactgtgcgagaga .t 

3 108#3 ccgtgtattactgtgcgagagg g 

4 323#22 ccgtatattactgtgcgaaaga ....a a... 

5 330#23 ctgtgtattactgtgcgaaaga .t a... 

6 439#44 ctgtgtattactgtgcgagaca .t c. 

7 551#48 ccatgtattactgtgcgagaca ..a c. 

8 5a#49 ccatgtattactgtgcgagaAA ..a AA 

Seqs with the expected RE site only 1463 / 1617 

Seqs with only an unexpected site 0 

Seqs with both expected and unexpected. ... 7 
Seqs with no sites 0 



Table 6: Human HC GLG FR1 Sequences 

VH Exon - Nucleotide sequence alignment 

VH1 



1 -0? 


CAG GTG 


CAG 


CTG 


GTG 


CAG 


TCT 


GGG 


GCT 


GAG 


GTG 


AAG AAG 


CCT 


GGG 


GCC 


TCA 


GTG AAG 






TGC 


AAG 


GCT 


TCT 


GGA 


TAC 


ACC 


TTC 


ACC 














1-03 


cacj gtC 


cag 


ctT 


gtg 




tct 


9*99* 


get 


gag 


gtg 


aag aag 


cet 


ggg 


gee 


tea 


gtg aag 




gtT tcc 


t gc 


aag 


get 


Lv_ L 


gga 


far 


acc 


ttc 


acT 
















cdy y'-y 


cag 


ctg 


gtg 


cag 


LL L 


999 


get 


gag gtg 


aag aag 


cct 


ggg 


gee 


tea 


gtg aag 






tgc 


aag 


get 


LLL 


gga 


t ac 




ttc 


acc 














1 _1 R 
J. J. O 


Cag gtT 


cag 


ctg 


gtg 




tct 


99 A 


get 


gag 


gtg 


aag aag 




999 


gee 


tea 


gtg aag 




iff- /-< t~ r* r* 

y l.c ill 


tgc 


aag 


get 


ILL 


ggT 




ace 


ttT 


acc 














1-24 


ran rrt" r* 


cag 


ctg 


gtA 




tct 


999 


get 


gag 


gtg 


day aavj 


cct 


yyy 


gee 


tea 


gtg aag 




gtc tcc 


tgc 


aag 


git 


ILL 


gga 


far 


acc 


Ctc 


acT 














1 


cag Atg 


cag 


ctg 


gtg 


cag 


*- ~ *- 

tcc 


999 


get 


gag 


gtg 


aag aag 


AC U 


ggg 


Tcc 


tea 


gtg aag 




gtT tcc 


tgc 


aag 


get 




gga 


r ac 


acc 


ttc 


acc 














1 yic 
l-Hb 


cag gtg 


cag 


ctg 


gtg 


cag 


tct 


ggg 


get 


gag 


gtg 


aag aag 


cct 


ggg 


gee 


tea 


gtg aag 




gtT tcc 


tgc 


aag 


gcA 


♦■*•»+■ 
LC L 


gga 




acc 


ttc 


acc 














1 DO 


CdA Aug 


cag 


ctg 


gtg 


cag 


LCL 


999 


LCL 


gag gtg 


aag aag 


cct 


ggg 


Acc 


tea 


gtg aag 




gtc tcc 


tgc 


aag 


get 


LL L 


gga 


t-Tr 
L1L 


acc 


ttT 


acT 
















cag gtg 


cag 


ctg 


gtg 


cag 




999 


get 


gag 


gtg 


aag aag 


cct 


ggg 


Tcc 


tcG gtg aag 




gtc tcc 


tgc 


aag 


get 


uCu 


gg a 




acc 


ttc 


aGc 














1 — A 

i-e 


cag gt,g 


cag 


ctg 


gtg 


cag 


~ t- 

uCu 


999 


get 


gag gtg 


aag aag 


cct 


ggg 


Tcc 


tcG gtg aag 




gtc tcc 


tgc 


aag 


get 


LCL 


gga 




acc 


ttc 


aGc 














1 -f 


vjciu y L\> 


cag 


ctg 


gtA 




tct 


ggg 


get 


gag 


gtg 


acty ctcty 


cct 


yyy 


gcT Aca 


gtg aaA 






tgc 


aag 


y l l. 




gga 


far 


acc 


ttc 


acc 














v nz. 


































Z UJ 


pari afr 

Unu /AIL, 


ACC 


TTG 


AAG 


GAG 


TCT 


GGT 


CCT 


ACG 


CTG 


GTG AAA 


ccc 


ACA 


CAG 


ACC 


CTC ACG 




ctg act 

LIU /\L L 


TGC 


arr 


TTC 


TCT 


GGG 


TTC 


TCA 


CTC 


AGC 














2-26 


Lay oll 


acc 


ttg 




gag 


tct 


ggt 




GTg 


ctg 


gtg aaa 


ccc 


aca 


Gag 


acc 


ctc acg 




*-y qu^> 


tgc 


acc 


Gtc 


tct 


firrrr 

yyy 


ttc 


tea 


ctc 


age 














2- /Q 


cag Gtc 


acc 


ttg 


aag 


gag 


tct 


ggt 


cct 


Gcg 


ctg 


gtg aaa 


ccc 


aca 


cag 


acc 


ctc acA 




ctg acc 


tgc 


acc 


ttc 


tct 


ggg 


ttc 


tea 


ctc 


age 














VH3 


































3-07 


GAG GTG 


CAG 


CTG 


GTG 


GAG 


TCT 


GGG 


GGA 


GGC 


TTG 


GTC CAG 


CCT 


GGG 


GGG 


TCC 


CTG AGA 




CTC TCC 


TGT 


GCA 


GCC 


TCT 


GGA 


TTC 


ACC 


TTT 


AGT 














3-09 


gaA gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtA cag 


cct 


ggC 


Agg 


tcc 


ctg aga 




etc tcc 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttt 


GAt 














3-11 


Cag gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtc Aag 


cct 


ggA ggg 


tcc 


ctg aga 




etc tcc 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 














3-13 


gag gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtA cag 


cct 


ggg 


ggg 


tcc 


ctg aga 




etc tcc 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 














3-15 


gag gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtA Aag 


cct 


ggg 


ggg 


tcc 


ctT aga 




etc tcc 


tgt 


gca 


gee 


tct 


gga 


ttc 


acT 


ttc 


agt 
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10 



~=7 

V 15 

in 
m 

4 



SI 

20 



in 

e 25 

s . 



30 



35 



40 



3- 


■20 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggT 


Gtg gtA cGg cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttt 


GAt 










3- 


•21 


gag gtg cag ctg gtg gag 


tct 


ggg 


gga 


ggc Ctg gtc Aag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


•23 


gag 


gtg 


cag 


ctg 


Ttg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtA cag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttt 


agC 










3- 


•30 


Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg gtc cag cct 


ggg Agg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


•30. 


, 3 Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg gtc cag cct 


ggg Agg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


•30. 


. 5 Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg gtc cag cct 


ggg Agg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


•33 


Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg gtc cag cct 


ggg Agg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gcG 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


•43 


gaA 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


gTc 


Gtg gtA cag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttt 


GAt 










3- 


•48 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtA cag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


49 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtA cag ccA ggg Cgg tec 


ctg 


aga 






etc 


tec 


tgt 


Ac a 


gcT 


tct 


gga 


ttc 


ace 


ttt 


Ggt 










3- 


53 


gag 


gtg 


cag 


ctg 


gtg 


gag 


Act 


ggA 


gga 


ggc 


ttg Ate cag cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


ggG 


ttc 


ace 


GtC 


agt 










3- 


64 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtc cag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 










3- 


66 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtc cag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


GtC 


agt 










3- 


72 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtc cag cct 


ggA ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 










3- 


73 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg gtc cag cct 


ggg ggg 


tec 


ctg 


aAa 






etc 


tec 


tgt 


gca 


gee 


tct 


ggG 


ttc 


acc 


ttc 


agt 










3- 


74 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tcC 


ggg 


gga 


ggc 


ttA gtT cag cct 


ggg ggg 


tec 


ctg 


aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 










3- 


d 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


Cgg 


gga 


gTc 


ttg gtA cag cct 


ggg ggg 


tec 


ctg 


aga 


VH4 


etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


GtC 


agt 










4-04 


CAG 


GTG 


CAG 


CTG 


CAG 


GAG 


TCG 


GGC 


CCA 


GGA 


CTG GTG AAG CCT 


TCG GGG 


ACC 


CTG 


TCC 






CTC 


ACC 


TGC 


GCT 


GTC 


TCT 


GGT 


GGC 


TCC 


ATC 


AGC 










4- 


28 


cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg gtg aag cct 


teg gAC 


acc 


ctg 


tec 






etc 


ace 


tgc 


get 


gtc 


tct 


ggt 


TAc 


tec 


ate 


age 










4- 


•30. 


. 1 cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg gtg aag cct 


tcA CAg 


acc 


ctg 


tec 






etc 


ace 


tgc Act 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 










4- 


30. 


.2 cag Ctg 


cag 


ctg 


cag 


gag 


tcC 


ggc 


Tea 


gga 


ctg gtg aag cct 


tcA CAg 


acc 


ctg 


tec 






etc 


ace 


tgc 


get 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 











f 
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4-30.4 cag gtg cag ctg cag gag teg ggc cca gga ctg gtg aag cct tcA CAg acc ctg tec 

etc acc tgc Act gtc tct ggt ggc tec ate age 
4-31 cag gtg cag ctg cag gag teg ggc cca gga ctg gtg aag cct tcA CAg acc ctg tec 
\ etc acc tgc Act gtc tct ggt ggc tec ate age 

5 4-34 cag gtg cag ctA cag Cag tGg ggc Gca gga ctg Ttg aag cct teg gAg acc ctg tec 
etc acc tgc get gtc tAt ggt ggG tec Ttc agT 
4-39 cag Ctg cag ctg cag gag teg ggc cca gga ctg gtg aag cct teg gAg acc ctg tec 
I etc acc tgc Act gtc tct ggt ggc tec ate age 

I 4-59 cag gtg cag ctg cag gag teg ggc cca gga ctg gtg aag cct teg gAg acc ctg tec 

f 10 etc acc tgc Act gtc tct ggt ggc tec ate agT 

4-61 cag gtg cag ctg cag gag teg ggc cca gga ctg gtg aag cct teg gAg acc ctg tec 
h- ! etc acc tgc Act gtc tct ggt ggc tec Gtc age 

y 4-b cag gtg cag ctg cag gag teg ggc cca gga ctg gtg aag cct teg gAg acc ctg tec 

etc acc tgc get gtc tct ggt TAc tec ate age 

4* 1 5 VH5 

U': 5-51 GAG GTG CAG CTG GTG CAG TCT GGA GCA GAG GTG AAA AAG CCC GGG GAG TCT CTG AAG 

~ 5; ATC TCC TGT AAG GGT TCT GGA TAC AGC TTT ACC 

j; 5-a gaA gtg cag ctg gtg cag tct gga gca gag gtg aaa aag ccc ggg gag tct ctg aGg 



3 20 VH6 

w 

ru 
Ul 

r*», 7-4.1 CAG GTG CAG CTG GTG CAA TCT GGG TCT GAG TTG AAG AAG CCT GGG GCC TCA GTG AAG 

y : 25 GTT TCC TGC AAG GCT TCT GGA TAC ACC TTC ACT 



ate tec tgt aag ggt tct gga tac age ttt acc 

6-1 CAG GTA CAG CTG CAG CAG TCA GGT CCA GGA CTG GTG AAG CCC TCG CAG ACC CTC TCA 
CTC ACC TGT GCC ATC TCC GGG GAC AGT GTC TCT 

VH7 



i 
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Table 7: RERS sites in Human HC GLG FRls where there are at least 20 GLGs cut 
Bsgl GTGCAG 71 (cuts 16/14 bases to right) 



m 



m 

n ! 

in 





1: 


4 


1: 


13 


2: 


13 


3: 


4 


3: 


13 


4: 


13 




6: 


13 


7: 


4 


7: 


13 


8: 


13 


9: 


4 


9: 


13 


5 


10: 


4 


10: 


13 


15: 


4 


15: 


65 


16: 


4 


16: 


65 




17: 


4 


17: 


65 


18: 


4 


18: 


65 


19: 


4 


19: 


65 




20: 


4 


20: 


65 


21: 


4 


21: 


65 


22 : 


4 


22 : 


65 




23: 


4 


23: 


65 


24: 


4 


24: 


65 


25: 


4 


25: 


65 




26: 


4 


26: 


65 


27: 


4 


27: 


65 


28: 


4 


28: 


65 


10 


29: 


4 


30: 


4 


30: 


65 


31: 


4 


31: 


65 


32: 


4 




32: 


65 


33: 


4 


33: 


65 


34: 


4 


34: 


65 


35: 


4 




35: 


65 


36: 


4 


36: 


65 


37: 


4 


38: 


4 


39: 


4 




41: 


4 


42: 


4 


43: 


4 


45: 


4 


46: 


4 


47: 


4 




48: 


4 


48: 


13 


49: 


4 


49: 


13 


51: 


4 






15 


There arc 


s 39 


hits 


; at 


basefl 


1 4 














There are 21 


hits 


at 


basefl 


i 65 














_ ti _ 


ctgcac 










9 












12: 


63 


13: 


63 


14: 


63 


39: 


63 


41: 


63 


42: 


63 


20 


44: 


63 


45: 


63 


46: 


63 
















Bbvl 


GCAGC 








65 












1: 


6 


3: 


6 


6: 


6 


7: 


6 


8: 


6 


9: 


6 




10: 


6 


15: 


6 


15: 


67 


16: 


6 


16: 


67 


17: 


6 




17: 


67 


18: 


6 


18: 


67 


19: 


6 


19: 


67 


20: 


6 


25 


20: 


67 


21: 


6 


21: 


67 


22: 


6 


22: 


67 


23: 


6 




23: 


67 


24: 


6 


24: 


67 


25: 


6 


25: 


67 


26: 


6 




26: 


67 


27: 


6 


27: 


67 


28: 


6 


28: 


67 


29: 


6 




30: 


6 


30: 


67 


31: 


6 


31: 


67 


32: 


6 


32: 


67 




33: 


6 


33: 


67 


34: 


6 


34: 


67 


35: 


6 


35: 


67 


30 


36: 


6 


36: 


67 


37: 


6 


38: 


6 


39: 


6 


40: 


6 




41: 


6 


42: 


6 


43: 


6 


44: 


6 


45: 


6 


46: 


6 




47: 


6 


48: 


6 


49: 


6 


50: 


12 


51: 


6 







There are 43 hits at base# 6 Bolded sites very near sites 

listed below 

35 There are 21 hits at base# 67 

-"- gctgc 13 
37: 9 38: 9 39: 9 40: 3 40: 9 41: 9 
42: 9 44: 3 44: 9 45: 9 46: 9 47: 9 
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50: 9 

There are 11 hits at base! 9 





BsoFI 


GCngc 










78 










5 


1: 




3 : 


6 


6: 


6 


7: 


6 


8: 


6 


9: 


6 




iu : 


6 


15: 


b 


Id : 


b / 


i a • 
lb : 


b 


lb : 


an 

D / 


i n ■ 
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There are 43 hits at base# 6 These often occur together. 

There are 11 hits at base# 9 

20 There are 2 hits at base# 3 
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There are 43 hits at base# 6 Often together. 
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There are 2 hits at basejf 3 
There are 1 hits at base# 12 

There are 21 hits at base# 67 
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There are 22 hits at base# 52 52 and 48 never together. 
There are 9 hits at base# 48 
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HphI tcacc 



42 



1: 


86 


3: 


86 


6: 


86 


7: 


86 


8: 


80 


11: 


86 


12: 


5 


13: 


5 


14: 


5 


15: 


80 


16: 


80 


17: 


80 


18: 


80 


20: 


80 


21: 


80 


22: 


80 


23: 


80 


24: 


80 


25: 


80 


26: 


80 


27: 


80 


28: 


80 


29: 


80 


30: 


80 


31: 


80 


32: 


80 


33: 


80 


34: 


80 


35: 


80 


36: 


80 


37: 


59 


38: 


59 


39: 


59 


40: 


59 


41: 


59 


42: 


59 


43: 


59 


44: 


59 


45: 


59 


46: 


59 


47: 


59 


50: 


59 



There are 22 hits at base# 80 80 and 86 never together 
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There are 35 hits at base! 39 39 and 40 together twice. 
There are 2 hits at base# 40 
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There are 32 hits at base! 40 40 and 41 together twice 

There are 2 hits at base# 41 
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There are 35 hits at base# 40 
There are 2 hits at base# 41 
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46: 21 46: 22 47: 21 47: 22 50: 22 51: 44 

There are 23 hits at base# 47 These do not occur together 

There are 11 hits at base# 44 

There are 14 hits at base# 22 These do occur together. 
There are 9 hits at base# 21 
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There are 20 hits at base# 48 
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30: 46 31: 46 32: 46 33: 46 34: 46 35: 46 
36: 46 37: 46 43: 79 

There are 22 hits at base! 46 43 and 46 never occur together. 

There are 4 hits at base# 43 
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s 

Q 

HI 
Ui 

o 





12: 


16 


13: 


16 


14: 


16 


15: 


1 ft 

X U 


1 6 ■ 

X o • 


16 


17 : 


16 




18: 


16 


19: 


16 


20: 


16 


21: 


16 


22: 


16 


23: 


16 




24: 


16 


25: 


16 


26: 


16 


27: 


16 


27: 


26 


28: 


16 




29: 


16 


31: 


16 


32: 


16 


33: 


16 


34: 


16 


35: 


16 


10 


36: 


16 


36: 


26 


37: 


16 


38: 


16 


39 : 


16 


40: 


16 




41: 


16 


42: 


16 


44: 


16 


45: 


16 


46: 


16 


47 : 


16 




48: 


46 


49: 


46 




















There are 34 hits at 


base# 16 












X %J 


_ ti _ 


GACTC 










C. X 












15: 


56 


16: 


56 


17: 


56 


18: 


*j \j 


19 : 


56 


20 : 


56 




21: 


56 


22: 


56 


23: 


56 


24: 


56 


25: 


56 


26: 


56 




27: 


56 


28: 


56 


29: 


56 


30: 


56 


31: 


56 


32: 


56 




33: 


56 


35: 


56 


36: 


56 














20 


There are 


21 


hits 


at 


base! 


56 














Plel 


gagtc 










38 












12: 


16 


13: 


16 


14: 


16 


15: 


16 


16: 


16 


17: 


16 




18: 


16 


19: 


16 


20: 


16 


21: 


16 


22: 


16 


23: 


16 


25 


24: 


16 


25: 


16 


26: 


16 


27: 


16 


27: 


26 


28: 


16 




29: 


16 


31: 


16 


32: 


16 


33: 


16 


34: 


16 


35: 


16 




36: 


16 


36: 


26 


37: 


16 


38: 


16 


39: 


16 


40: 


16 




41: 


16 


42: 


16 


44: 


16 


45: 


16 


46: 


16 


47: 


16 




48: 


46 


49: 


46 


















30 


There are 34 hits 


at 


basel 


16 
















gactc 










21 












15: 


56 


16: 


56 


17: 


56 


18: 


56 


19: 


56 


20: 


56 




21: 


56 


22: 


56 


23: 


56 


24: 


56 


25: 


56 


26: 


56 




27: 


56 


28: 


56 


29: 


56 


30: 


56 


31: 


56 


32: 


56 


35 


33: 


56 


35: 


56 


36: 


56 















There are 21 hits at base# 56 

AlwNI CAGNNNctg 26 
15: 68 16: 68 17: 68 18: 68 



19: 68 20: 68 
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21: 68 22: 68 
27: 68 28: 68 
33: 68 34: 68 
41: 46 42: 46 
5 There are 22 h; 



23: 68 24: 68 

29: 68 30: 68 

35: 68 36: 68 

at base# 68 



25: 68 26: 68 
31: 68 32: 68 
39: 46 40: 46 




Table 8: Kappa FR1 GLGs 





! 1 


2 


3 


4 


c 
0 


6 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




! 13 


1 A 

14 


15 


16 


17 


18 


r- 

5 


GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 


10 


GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 


15 


GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




AAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 


20 


GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GCC 


ATC 


CAG 


TTG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GCC 


ATC 


CAG 


TTG 


ACC 


CAG 


25 


GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GAC 


ATC 


CAG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 


30 


GAC 


ATC 


CAG 


TTG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GCC 


ATC 


CGG 


ATG 


ACC 


CAG 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 




GCC 


ATC 


CGG 


ATG 


ACC 


CAG 


35 


GCA 


TCT 


ACA 


GGA 


GAC 


AGA 









1 c o 

lby 








I 


o 
0 




1 U 


i i 
1 1 


lz 




TCT 


CCA 


TCC 


TCC 


CTG 


1C1 




1 9 


20 


21 


22 


2 3 






GTC 


ACC 


ATC 


ACT 


TGC 




viz 


TCT 


CCA 


TCC 


TCC 


CTG 


rp /~»rp 

TCT 




GTC 


ACC 


ATC 


ACT 


TGC 


i 


02 


TCT 


CCA 


TCC 


TCC 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGC 


i 


Olo 


TCT 


CCA 


TCC 


TCC 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGC 


I 


08 


TCT 


CCA 


TCC 


TCC 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGC 


i 


■a o n 
AZ U 


TCT 


CCA 


TCC 


TCC 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGC 




7\ *3 n 


TCT 


CCA 


TCT 




AICj 






GTC 


ACC 


ATC 


ACT 


TGT 




T 1 A 


TCT 


CCA 


TCC 


TCA 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGT 


i 


T 1 

LI 


TCT 


CCA 


TCC 


TCA 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGT 


1 




TCT 


CCA 


TCC 


TCC 


CTG 


rp/~i rp 
1C1 




GTC 


ACC 


ATC 


ACT 


rp /— " 


f 


T A 


TCT 


CCA 


TCC 


TCC 


Lib 


1C 1 




GTC 


ACC 


ATC 


ACT 


TGC 


| 


T 1 Q 


TCT 


CCA 


TCT 


TCC 


GTG 


TCT 




GTC 


ACC 


ATC 


ACT 


mom 

TGT 


i 


ho 


TCT 


CCA 


TCT 


TCI 


Gl Lj 


rp/rp 




GTC 


ACC 


ATC 


7\ /TP 

ACT 


TGT 


| 


li y 


TCT 


CCA 


TCC 


TTC 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGC 


i 


L8 


TCT 


CCA 


TTC 


TCC 


CTG 


TCT 




GTC 


ACC 


ATC 


ACT 


TGC 


| 


L23 


TCT 


CCA 


TCC 


TCA 


TTC 


TCT 




GTC 


ACC 


ATC 


ACT 


TGT 


I 


L9 
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f 5 ^ 



Si 



Hi 

if] 





GTC 


ATC 


TGG 


ATG 


ACC 


CAG 


TCT 


CCA 


TCC 


TTA 


CTC 


TCT 




GCA 


TCT 


ACA 


GGA 


GAC 


AGA 


GTC 


ACC 


AIC 


AGT 


TGT 


i 




GCC 


ATC 


CAG 


ATG 


ACC 


CAG 


TCT 


CCA 


ICC 


TCC 


CTG 


TCT 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 


GTC 


ACC 


AIC 


ACT 


TGC 


i 


3 


GAC 


ATC 


CAG 


ATG 


ACC 


CAG 


TCT 


CCT 


TCC 


ACC 


CTG 


ICI 




GCA 


TCT 


GTA 


GGA 


GAC 


AGA 


GTC 


ACC 


ATC 


ACT 


TGC 


| 




GAT 


ATT 


GTG 


ATG 


ACC 


CAG 


ACT 


CCA 


CTC 


TCC 


CTG 


CCC 




GTC 


ACC 


CCT 


GGA 


GAG 


CCG 


GCC 


TCC 


ATC 


TCC 


TGC 


1 




GAT 


ATT 


GTG 


ATG 


ACC 


CAG 


ACT 


CCA 


CTC 


TCC 


CTG 


CCC 


10 


GTC 


ACC 


CCT 


GGA 


GAG 


CCG 


GCC 


TCC 


ATC 


TCC 


TGC 






GAT 


GTT 


GTG 


ATG 


ACT 


CAG 


TCT 


CCA 


CTC 


TCC 


CTG 


CCC 




GTC 


ACC 


CTT 


GGA 


CAG 


CCG 


GCC 


TCC 


ATC 


TCC 


TGC 


! 




GAT 


GTT 


GTG 


ATG 


ACT 


CAG 


TCT 


CCA 


CTC 


TCC 


CTG 


CCC 




GTC 


ACC 


CTT 


GGA 


CAG 


CCG 


GCC 


TCC 


ATC 


TCC 


TGC 




15 


GAT 


ATT 


GTG 


ATG 


ACC 


CAG 


ACT 


CCA 


CTC 


TCT 


CTG 


TCC 




GTC 


ACC 


CCT 


GGA 


CAG 


CCG 


GCC 


TCC 


ATC 


TCC 


TGC 






GAT 


ATT 


GTG 


ATG 


ACC 


CAG 


ACT 


CCA 


CTC 


TCT 


CTG 


TCC 




GTC 


ACC 


CCT 


GGA 


CAG 


CCG 


GCC 


TCC 


ATC 


ICC 


ICC 


i 




GAT 


ATT 


GTG 


ATG 


ACT 


CAG 


TCT 


CCA 


CTC 


ICC 


CTG 


CCC 


Z U 


GTC 


ACC 


CCT 


GGA 


GAG 


CCG 


GCC 


TCC 


ATC 


TCC 


rp r*r* 
TGC 


t 




GAT 


ATT 


GTG 


ATG 


ACT 


CAG 


TCT 


CCA 


CTC 


TCC 


CTG 


f~* C 

CCC 




GTC 


ACC 


CCT 


GGA 


GAG 


CCG 


GCC 


TCC 


ATC 


TCC 


TGC 


1 




GAT 


ATT 


GTG 


ATG 


ACC 


CAG 


ACT 


CCA 


CTC 


TCC 


TCA 


CC1 




GTC 


ACC 


CTT 


GGA 


CAG 


CCG 


GCC 


TCC 


ATC 


TCC 


rp r*r+ 
TGC 


1 


25 


GAA 


ATT 


GTG 


TTG 


ACG 


CAG 


TCT 


CCA 


GGC 


ACC 


CTG 


TCT 




TTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


I 




GAA 


ATT 


GTG 


TTG 


ACG 


CAG 


TCT 


CCA 


GCC 


ACC 


CTG 


TCT 




TTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


I 




GAA 


ATA 


GTG 


ATG 


ACG 


CAG 


TCT 


CCA 


GCC 


ACC 


CTG 


TCT 


30 


GTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


i 




GAA 


ATA 


GTG 


ATG 


ACG 


CAG 


TCT 


CCA 


GCC 


ACC 


CTG 


TCT 




GTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


| 




GAA 


ATT 


GTG 


TTG 


ACA 


CAG 


TCT 


CCA 


GCC 


ACC 


CTG 


TCT 




TTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


\ 


35 


GAA 


ATT 


GTG 


TTG 


ACA 


CAG 


TCT 


CCA 


GCC 


ACC 


CTG 


TCT 



L24 



Lll 



L12 



Oil 



01 



A17 



Al 



A18 



A2 



A19 



A3 



A23 



A27 



All 



L2 



L16 



L6 
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y 



yi 



10 



TTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


* 


T O f\ 

Liz U 


GAA 


ATT 


GTA 


ATG 


ACA 


CAG 


TCT 


CCA 


GCC 


ACC 


CTG 


TCT 




TTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


i 


T O K 


GAC 


ATC 


GTG 


ATG 


ACC 


CAG 


TCT 


CCA 


GAC 


TCC 


CTG 


GCT 




GTG 


TCT 


CTG 


GGC 


GAG 


AGG 


GCC 


ACC 


ATC 


AAC 


TGC 


• 


Bo 


GAA 


ACG 


ACA 


CTC 


ACG 


CAG 


TCT 


CCA 


GCA 


TTC 


ATG 


TCA 




GCG 


ACT 


CCA 


GGA 


GAC 


AAA 


GTC 


AAC 


ATC 


TCC 


TGC 


| 


B2 


GAA 


ATT 


GTG 


CTG 


ACT 


CAG 


TCT 


CCA 


GAC 


TTT 


CAG 


TCT 




GTG 


ACT 


CCA 


AAG 


GAG 


AAA 


GTC 


ACC 


ATC 


ACC 


TGC 




A2 6 


GAA 


ATT 


GTG 


CTG 


ACT 


CAG 


TCT 


CCA 


GAC 


TTT 


CAG 


TCT 




GTG 


ACT 


CCA 


AAG 


GAG 


AAA 


GTC 


ACC 


ATC 


ACC 


TGC 


j 


A10 


GAT 


GTT 


GTG 


ATG 


ACA 


CAG 


TCT 


CCA 


GCT 


TTC 


CTC 


TCT 




GTG 


ACT 


CCA 


GGG 


GAG 


AAA 


GTC 


ACC 


ATC 


ACC 


TGC 


f 


A14 



X 
u 

tt > 



P 
C 



rv 
m 

u 



C\J 



1 



v 



m 



9 

CO 

? 



8 



o 



IN 

o 



oo 

o 



3 



o 
o 



00 

—i 



Cs 



3 



2 



2 



o 



m 

01 



8 

0 



00 



=8 



3 
2 



8 



O 
ID 



8 



m 
o 



oo 

CM 



oo 
so 



oo 



CM 
CM 



CM 



CN 
i— i 
O 



I 



oo 

CM 



CM 



cm 

CN 

< 



3 

CM 



VO 
Cv 
CNI 

8 

CM 



IT) 



X 

U 
>> 

X 5 



M 



A 
O 



4* 

SI 



S 



oo 



IF 



to 

TO 



<N 



Ml 



CQ 



SO 



o 
< 



if) 



Pi 



£ * 1 



Ul 
Qi 
M 



3 

c 

i pi 



5h "3> 



V 
A 



A 

! 



vO 

o 



»— « 

o 



o 



QO 

o 



S 

3 



oo 
3 



3 



3 



m 



« QU 

a. j« 



a 

o 

1 



fll 



H 



co 
to 

CM 



en 



ry 



t~* 1/1 



§ 1 



(M 



CO 



CM 

in 



CM 
CM 



co 
r» 

CM 



00 



o 

CO 

CM 



2 





Hpall 


m 

a 

1 1 




























































p: 

«>v 

UI 

ff 3 : 


HphI 


CM 
O 

H 

SO 

m 

a 

00 
m 

a 






















3756 3762 


3856 3862 


3956 3962 


t. : 
•a. : 
1i: 
































ffi: 

G 

H : 


Maelll 


Tsp45I same 
sites 






















3737 3755 


3837 3855 


3937 3955 


rs • 
































m 
































1 

r- 


Mlyl 


• 

V 

A 
! 

A 

! 














3525 




3639 




3712 3739 


3812 3839 


3939 


rH 

1 


Hinfl 
















3525 




3639 




3712 3739 


3812 3839 


3939 




Sfcl 






























SfaNI 
































L2 3001-3069 


L16 3101-3169 


L6 3201-3269 


L20 3301-3369 


L25 3401-3469 




B3 3501-3569 




B2 3601-3669 




A26 3701-3769 


AlO 3801-3869 


A14 3901-3969 



if) o 



w 



in 
c 



x 



% 'S t o 
« U 2 Z 



1 

Cu 
PQ 



A 



CO 



T3 
C 

•a 

G 

O 



2 



2 

s 



2 

CQ 



1 

o 

1 



1 
1 
1 



S 



oo 

o 



3 



IT) 



CN 

o 
m 

a, 

(A 


• 


■ 


• 


• 


■ 


























Haelll 














1954 


2054 


2154 


2254 


2354 


2454 


2554 


2654 


2754 




BsrFI 
Cac8I 
Nael 
NgoMIV 


i 


• 




• 






m 

CN 


m 


m 

<N 


m 

CN 


m 
«n 
CN 


m 


m 
m 

CN 


m 

NO 

CN 


m 




I § A 














1944 


2044 










2544 


2644 


i 




BssKI (NstNI) 
xx22 xx30 xx43 














1943 


2043 






2343 


2443 


2543 


2643 






? 
i 

^> CN 

CN 














1942 


2042 


2142 


2242 


2342 


2442 


2542 


2642 


2742 






L23 1401-1469 


L9 1501-1569 


L24 1601-1669 


Lll 1701-1769 


L12 1801-1869 






Oil 1901-1969 


Ol 2001-2069 


A17 2101-2169 


Al 2201-2269 


A18 2301-2369 


A2 2401-2469 


A19 2501-2569 


A3 2601-2669 


A23 2701-2769 





p 

IT 

ff! 

S! 
?» 



m 
m 



o 



X 



U_ GO *r* *k 



PQ U 



Z 2 



5 J 

■a 1 ' 
I 1 1 



1 



<N 



§ 



o 



8 



o 
m 



CQ 



LO 



IT) 
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Table 10 Lambda FR1 GLG sequences 
! VL1 







CAG 


TCT 


GTG 


CTG 


ACT 


CAG 


CCA 


CCC 


TCG 


GTG 


TCT GAA 






GCC 


CCC 


AGG 


CAG 


AGG 


GTC 


ACC 


ATC 


TCC 


TGT 


! la 


5 




cag 


tct 


ata 


ctg 


acG 


caa 

^ 3 


ccG 


CCC 


tcA gtg 


tct gGG 






gcc 


ccA 


Gaa 


caq 

^ 3 


aaa 

^33 


qtc 


acc 


ate 


tec 


tgC 


! le 






caq 


tct 


qtq 


ctq 

3J 


act 


cag 


cca 


CCC 


tcA gCg 


tct gGG 






Acc 


CCC 


Gqq 


caq 


aqq 

3 3 


qtc 


acc 


ate 


tcT 


tgt 


! 1c 






cag 


tct 


qtq 


ctq 


act 


cag 


cca 


CCC 


tcA gCg 


tct gGG 


10 




Acc 


CCC 


Gqq 

3 3 


caq 


aqq 

3 3 


qtc 


acc 


ate 


tcT 


tgt 


! Ig 






caq 


tct 


qtq 

3 3 


Ttg 


acG 


cag 


ccG 


CCC 


tcA 


gtg 


tct gCG 






acc 


ccA 


GqA 


caq 


aAg 


qtc 


acc 


ate 


tec 


tgC 


! lb 




! VL2 




























CAG 


TCT 


GCC 


CTG 


ACT 


CAG 


CCT 


CCC 


TCC 


GCG 


TCC GGG 


15 




TCT 


CCT 


GGA 


CAG 


TCA 


GTC 


ACC 


ATC 


TCC 


TGC 


! 2c 






cag 


tct 


gcc 


eta 


act 


caq 


cct 


cGc 


tcA gTg 


tec ggg 

3 3 3 






tct 


cct 


aaa 
yy a 


cag 


tea 


ate 


acc 


ate 


tec 


tgc! 


2e 






caa 


tct 


qcc 


ctq 


act 


cag 


cct 


Gcc 


tec 


gTg 


tcT ggg 






tct 


cct 


qqa 

3 3 


caq 


tcG 


Ate 


acc 


ate 


tec 


tgc 


! 2a2 


20 




caq 


tct 


qcc 


ctq 


act 


caq 


cct 


CCC 


tec 


gTg 


tec ggg 






tct 


cct 


qqa 

3 3 


caq 


tea 


qtc 


acc 


ate 


tec 


tgc 


! 2d 






caq 


tct 


qcc 


ctg 


act 


cag 


cct 


Gcc 


tec 


gTg 


tcT ggg 






tct 


cct 


qqa 

3 3 


caq 


tcG 


Ate 


acc 


ate 


tec 


tgc 


! 2b2 




! VL3 




























1 ^L. 


TAT 




CTG 


APT 




CCA 


CCC 


TCA 


GTG 


TCC GTG 






TCC 


CCA 


GGA 


CAG 


ACA 


GCC 


AGC 


ATC 


ACC 


TGC! 


3r 






tec 


tat 


gag 


ctg 


act 


cag 


cca 


cTc 


tea 


gtg 


tcA gtg 






Gcc 


cTG 


gga 


cag 


acG 


gcc 


agG 


atT 


acc 


tgT 


! 3j 






tec 


tat 


gag 


ctg 


acA 


cag 


cca 


CCC 


tcG 


gtg 


tcA gtg 


30 




tec 


cca 


gga 


caA 


acG 


gcc 


agG 


ate 


acc 


tgc! 


3p 






tec 


tat 


gag 


ctg 


acA 


cag 


cca 


CCC 


tcG 


gtg 


tcA gtg 






tec 


cTa 


gga 


cag 


aTG 


gcc 


agG 


ate 


acc 


tgc 


! 3a 






tcT 


tct 


gag 


ctg 


act 


cag 


GAC 


ccT 


GcT 


gtg 


tcT gtg 






Gcc 


TTG 


gga 


cag 


aca 


gTc 


agG 


ate 


acA 


tgc 


! 31 
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tec tat 
Gcc cca 
tec tat 
tec cca 
tec tat 
tec cca 
tec tat 
tcT ccG 

CTG CCT 
TTG CTG 
cAg cct 
tCC ctg 
cAg cTt 
tCC ctg 

CAG CCT 
TCT CCT 
cag Get 
tct cct 
cag cct 
tct Tct 

AAT TTT 
TCT CCG 

CAG ACT 
TCC CCA 
cag Get 
tec cca 

CAG ACT 
TCC CCT 



gTg ctg 
gga Aag 
gag ctg 
gga cag 
gag ctg 
gga cag 
gag ctg 
gga cag 

GTG CTG 
GGA GCC 
gtg ctg 
gga Tec 
gtg ctg 
gga gcc 

GTG CTG 
GGA GAA 
gtg ctg 
gga gCa 
gtg ctg 
gga gCa 

ATG CTG 
GGG AAG 

GTG GTG 
GGA GGG 
gtg gtg 
gga ggg 

GTG GTG 
GGA GGG 



act cag 
acG gcc 
acA cag 
aca gcc 
aTG cag 
acG gcc 
acA cag 
aca gcc 

ACT CAG 
TCG ATC 
act caA 
teg Gtc 
act caA 
teg Gtc 

ACT CAG 
TCC GCC 
act cag 
tcA gcc 
act cag 
tcA gTc 

ACT CAG 
ACG GTA 

ACT CAG 
ACA GTC 
act cag 
aca gtc 

ACC CAG 
ACA GTC 



cca ccc 
agG atT 
cTa ccc 
agG ate 
cca ccc 
agG ate 
cca Tec 
agG ate 

CCC CCG 
AAG CTC 
TcA TcC 
aag etc 
TcG ccC 
aag etc 

CCA CCT 
AGA CTC 
ccG Get 
agT etc 
cca Tct 
aga etc 

CCC CAC 
ACC ATC 

GAG CCC 
ACT CTC 
gag ccc 
act etc 

GAG CCA 
ACA CTC 



tea gtg 
ace tgT 
tcG gtg 
acc tgc 
tcG gtg 
acc tgc 
tea gtg 
acc tgc 

TCT GCA 
ACC TGC 
tct gcC 
acc tgc 
tct gcC 
acc tgc 

TCC TCC 
ACC TGC 
tec CTc 
acc tgc 
tec CAT 
acc tgc 

TCT GTG 
TCC TGC 

TCA CTG 
ACC TGT 
tea ctg 
acc tgt 

TCG TTC 
ACT TGT 



tcA gtg 
! 3h 
tcA gtg 
! 3e 
tcA gtg 
! 3m 
tcA gtg 
! V2-19 

TCT GCC 
! 4c 
tct gcT 
! 4a 
tct gcc 
! 4b 

TCC GCA 
! 5e 
tcT gca 
! 5c 
tcT gca 
! 5b 

TCG GAG 
! 6a 

ACT GTG 
! 7a 
act gtg 
! 7b 

TCA GTG 
! 8a 
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! VL9 

CAG CCT GTG CTG ACT CAG CCA CCT TCT GCA TCA GCC 

TCC CTG GGA GCC TCG GTC ACA CTC ACC TGC ! 9a 

! VL10 

5 CAG GCA GGG CTG ACT CAG CCA CCC TCG GTG TCC AAG 

GGC TTG AGA CAG ACC GCC ACA CTC ACC TGC ! 10a 



u 

e 

£ 

m 
m 

M 

h 

IV 

m 
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in 
m 

M 
Ji; 

O 



Q 
H 



10 



15 



20 



25 



30 



Table 11 RERSs found in human lambda FR1 GLGs 

! There are 31 lambda GLGs 

Mlyl NnnnnnGACTC 25 





1: 


6 


3: 


6 


4: 


6 


6: 


6 


7: 


6 


8: 


6 


5 


9: 


6 


10: 


6 


11: 


6 


12: 


6 


15: 


6 


16: 


6 




20: 


6 


21: 


6 


22: 


6 


23: 


6 


23: 


50 


24: 


6 




25: 


6 


25: 


50 


26: 


6 


27: 


6 


28: 


6 


30: 


6 



31: 6 

There are 23 hits at base# 



26: 



GAGTCNNNNNn 
34 



Mwol GCNNNNNnngc 



1: 
12: 
19: 
30: 



9 
9 
9 
9 



2 
13 
20 
31 



There are 
Hinfl Gantc 

1: 12 3 

9: 12 
20: 12 
24: 12 
28: 12 
There are 



9 3 
9 14 
9 23 
9 

19 hits at 



9 
9 
9 



20 
4: 9 
16: 9 
24: 9 



12 4: 
12 11: 
12 22: 
12 25: 
30: 12 31: 
23 hits at 



10 
21 
25 



base# 9 

27 
6: 12 
12: 12 
23: 12 
26: 12 



Plel gactc 



1: 
9: 
20: 
25: 
31: 



12 
12 
12 
12 
12 



3: 
10: 
21: 
25: 



12 
12 
12 
56 



4: 
11: 
22: 
26: 



12 
12 
12 
56 
12 

base# 12 

25 

12 6: 12 
12 12: 12 
12 23: 12 
12 27: 12 



11: 
17: 
25: 



7: 
15: 
23: 
26: 



7: 
15: 
23: 
28: 



9 
9 
9 



12 
12 
46 
34 



12 
12 
56 
12 



11: 
18: 
26: 



56 
9 
9 



8: 12 

16: 12 

23: 56 

27: 12 



8: 12 

16: 12 

24: 12 

30: 12 



There are 23 hits at base# 12 



35 



gagtc 
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26: 34 



Ddel 


Ctnag 










32 










1: 


14 


2: 


24 


3: 


14 


3: 


24 


4: 


14 


4: 


24 


5: 


24 


6: 


14 


7: 


14 


7: 


24 


8: 


14 


9: 


14 


10: 


14 


11: 


14 


11: 


24 


12: 


14 


12: 


24 


15: 


5 


15: 


14 


16: 


14 


16: 


24 


19: 


24 


20: 


14 


23: 


14 


24: 


14 


25: 


14 


26: 


14 


27: 


14 


28: 


14 


29: 


30 


30: 


14 


31: 


14 



















u 

». 10 There are 21 hits at base# 14 

W 

O 

4* BsaJI Ccnngg 38 

in 

ZJ 1: 23 1: 40 2: 39 2: 40 3: 39 3: 40 

W 

4: 39 4: 40 5: 39 11: 39 12: 38 12: 39 
£ 15 13: 23 13: 39 14: 23 14: 39 15: 38 16: 39 

s 17: 23 17: 39 18: 23 18: 39 21: 38 21: 39 

u 

p 21: 47 22: 38 22: 39 22: 47 26: 40 27: 39 

m 28: 39 29: 14 29: 39 30: 38 30: 39 30: 47 

Ul 31: 23 31: 32 

20 There are 17 hits at base# 39 
There are 5 hits at base# 38 

There are 5 hits at base# 40 Makes cleavage ragged. 

Mnll cctc 35 
1: 23 2: 23 3: 23 
25 6: 23 7: 19 8: 23 

11: 23 13: 23 14: 23 

19: 23 20: 47 21: 23 

22: 29 22: 35 22: 47 

27: 23 28: 23 30: 35 
30 There are 21 hits at base# 23 

There are 3 hits at base# 19 

There are 3 hits at base# 29 

There are 1 hits at base# 26 

There are 1 hits at base# 27 These could make cleavage ragged. 

35 gagg 7 



4: 


23 


5: 


23 


6: 


19 


9: 


19 


9: 


23 


10: 


23 


16: 


23 


17: 


23 


18: 


23 


21: 


29 


21: 


47 


22: 


23 


23: 


26 


23: 


29 


24: 


27 


30: 


47 


31: 


23 
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1 : 


48 


2: 


48 


3: 


48 


4 : 


48 


27 : 


4 4 


28 : 


44 




29 : 


44 
























BssKI 


Nccngg 










39 










5 


1: 


40 


2: 


39 


3: 


39 


3: 


40 


4: 


39 


4: 


40 




5: 


39 


6: 


31 


6: 


39 


7: 


31 


7: 


39 


8: 


39 




9: 


31 


9: 


39 


10: 


39 


11: 


39 


12: 


38 


12: 


52 




13: 


39 


13: 


52 


14: 


52 


16: 


39 


16: 


52 


17: 


39 




17: 


52 


18: 


39 


18: 


52 


19: 


39 


19: 


52 


21: 


38 


10 


22: 


38 


23: 


39 


24: 


39 


26: 


39 


27: 


39 


28: 


39 




29: 


14 


29: 


39 


30: 


38 















15 



There are 21 hits at base# 39 

There are 4 hits at base# 38 

There are 3 hits at base# 31 

There are 3 hits at base# 40 Ragged 



20 



25 



30 

40 5: 40 6: 40 7: 40 
10: 40 11: 40 12: 39 12: 53 
14: 53 16: 40 16: 53 17: 40 
18: 53 19: 53 21: 39 22: 39 
27: 40 28: 40 29: 15 29: 40 
There are 17 hits at base# 40 
There are 7 hits at base# 53 
There are 4 hits at base# 39 
There are 1 hits at base# 41 Ragged 



BstNI CCwgg 
1: 41 2 
9: 40 

13: 53 

18: 40 

24: 40 



8: 40 

13: 40 

17: 53 

23: 40 

30: 39 



30 



35 



PspGI ccwgg 



30 



1: 


41 


2: 


40 


5: 


40 


6: 


40 


7: 


40 


8: 


40 


9: 


40 


10: 


40 


11: 


40 


12: 


39 


12: 


53 


13: 


40 


13: 


53 


14: 


53 


16: 


40 


16: 


53 


17: 


40 


17: 


53 


18: 


40 


18: 


53 


19: 


53 


21: 


39 


22: 


39 


23: 


40 


24: 


40 


27: 


40 


28: 


40 


29: 


15 


29: 


40 


30: 


39 



There are 
There are 



17 hits at base# 40 
7 hits at base# 53 
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There are 4 hits at base# 39 
There are 1 hits at base# 41 



* 
m 

a:: 
4=: 



b 



10 



15 



ScrFI CCngg 

1: 41 2: 40 



5: 40 

9: 32 
13: 40 
17: 53 
22: 39 
29: 15 
There are 
There are 
There are 



6: 32 

9: 40 

13: 53 

18: 40 

23: 40 

29: 40 



3: 
6: 
10: 
14: 
18: 
24: 
30: 



21 hits at 
4 hits at 
3 hits at 



40 
40 
40 
53 
53 
40 
39 

base# 40 
base# 39 
base# 41 



39 

3: 41 

7: 32 

11: 40 

16: 40 

19: 40 

26: 40 



Maelll gtnac 

1: 52 2: 52 3: 

7: 52 9: 52 26: 

28: 52 29: 10 29: 



52 
52 
52 



16 
4: 52 
27: 10 
30: 52 



4: 40 

7: 40 

12: 39 

16: 53 

19: 53 

27: 40 



5: 52 
27: 52 



4: 41 

8: 40 

12: 53 

17: 40 

21: 39 

28: 40 



6: 52 
28: 10 



20 



There are 13 hits at base# 52 



Tsp45I gtsac 



15 





1: 


52 


2: 


52 


3: 


52 


4: 


52 


5: 


52 


6: 




7: 


52 


9: 


52 


27: 


10 


27: 


52 


28: 


10 


28: 


25 


29: 


10 


29: 


52 


30: 


52 













There are 12 hits at base# 52 



52 



30 



HphI tcacc 

1: 53 2: 53 



7: 
14: 
22: 
30: 



53 
59 
59 
59 



8: 53 

17: 59 

23: 59 

31: 59 



3: 53 

9: 53 

18: 59 

24: 59 



26 
4: 53 



10: 
19: 
25: 



53 
59 
59 



5: 53 

11: 59 

20: 59 

27: 59 



6: 53 

13: 59 

21: 59 

28: 59 



There are 16 hits at base# 59 
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There are 10 hits at base# 53 

BspMI ACCTGCNNNNn 14 
11: 61 13: 61 14: 61 17: 61 18: 61 19 
20: 61 21: 61 22: 61 23: 61 24: 61 25 
30: 61 31: 61 

There are 14 hits at base# 61 Goes into CDR1 
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Table 12: Matches to URE FR3 adapters 


in 7 9 human HC. 






A. List or 


Heavy-chains 


genes sampled 








AF008566 




AF103367 




HSA235674 


HSU94417 


S83240 




AF035043 




AF103368 




HSA235673 


HSU94418 


SABVH369 


5 


AF103026 




AF103369 




HSA240559 


HSU96389 


SADEIGVH 




afl03033 




AF103370 




HSCB201 


HSU96391 


SAH2IGVH 




AF103061 




afl03371 




HSIGGVHC 


HSU96392 


SDA3IGVH 




Afl03072 




AF103372 




HSU44791 


HSU96395 


SIGVHTTD 




afl03078 




Ar 1 Do J o 1 






HSZ93849 


SUK4IGVH 


10 


AF103099 




E05213 




HSU82771 


HSZ93850 










E05886 




HSU82949 


HSZ93851 






AF103103 




E05887 




HSU82950 


HSZ93853 






AF103174 




HSA235661 




HSU82952 


HSZ93855 






AF103186 




HSA235664 




HSU82961 


HSZ93857 




15 


afl03187 




HSA235660 




HSU86522 


HSZ93860 






AF103195 




HSA235659 




HSU86523 


HSZ93863 






afl03277 




HSA235678 




HSU92452 


MCOMFRAA 






afl03286 




HSA235677 




HSU94412 


MCOMFRVA 






AF103309 




HSA23567 6 




HSU94415 


S82745 




20 


afl03343 




HSA235675 




HSU94416 


S82764 






Table 12B. 


Testing all 


distinct GLGs 


from bases 89, 1 


to 93.2 of 




the heavy variable domain 










Id 


Nb 


0 1 


2 


3 4 




SEQ ID 




NO: 














25 


1 


38 


15 11 


10 


0 2 Seql 


gtgtattactgtgc 


25 




2 


19 


7 6 


4 


2 0 Seq2 


gtAtattactgtgc 


26 




3 


1 


0 0 


1 


0 0 Seq3 


gtgtattactgtAA 


27 




4 


7 


1 5 


1 


0 0 Seq4 


gtgtattactgtAc 


28 




5 


0 


0 0 


0 


0 0 Seq5 


Ttgtattactgtgc 


29 


30 


6 


0 


0 0 


0 


0 0 Seq6 


TtgtatCactgtgc 


30 




7 


3 


1 0 


1 


1 0 Seq7 


ACAtattactgtgc 


31 




8 


2 


0 2 


0 


0 0 Seq8 


ACgtattactgtgc 


32 




9 


9 


2 2 


4 


1 0 Sea9 ATatattactatac 


33 




Group 




26 26 


21 


4 2 






35 


Cumulative 




26 52 


73 


77 79 








Table 12C 


Most important URE recognition seqs in FR3 


Heavy 



1 VHSzyl GTGtattactgtgc (ON_SHC103) (SEQ ID NO: 25) 

2 VHSzy2 GTAtattactgtgc (ON_SHC323) (SEQ ID NO: 26) 

3 VHSzy4 GTGtattactgtac (ON_SHC349) (SEQ ID NO: 28) 
40 4 VHSzy9 ATGtattactgtgc (ON_SHC5a) (SEQ ID NO: 33) 



Table 12D, testing 79 human HC V genes with four probes 

Number of sequences 79 

Number of bases 29143 
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Number of mismatches 



Id 


Best 0 


1 


2 


3 


4 


5 












1 


39 15 


11 


10 


1 


2 


0 


Seql 


gtgtattactgtgc 


(SEQ 


ID 


NO: 25) 


2 


22 7 


6 


5 


3 


0 


1 


Seq2 


gtAtattactgtgc 


(SEQ 


ID 


NO:26) 


3 


7 1 


5 


1 


0 


0 


0 


Seq4 


gtgtattactgtAc 


(SEQ 


ID 


NO:28) 


4 


11 2 


4 


4 


1 


0 


0 


Seq9 


ATqtattactatac 


(SEQ 


ID 


NO: 33) 



Group 25 26 20 5 2 

Cumulative 25 51 71 76 78 



One sequence has five mismatches with sequences 2, 4, and 9; 
it is scored as best for 2. 

Id is the number of the adapter. 

Best is the number of sequence for which the identified 
adapter was the best available. 

The rest of the table shows how well the sequences match the 
adapters. For example, there are 10 sequences that match 
VHSzyl(Id=l) with 2 mismatches and are worse for all other 
adapters. In this sample, 90% come within 2 bases of one of 
the four adapters. 
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Table 13 



The following list of enzymes was taken from 
http : //rebase . neb . com/cai-bin/asvmmlist . 



I have removed the enzymes that a) cut within the recognition, b) cut on 
5 both sides of the recognition, or c) have fewer than 2 bases between 
recognition and closest cut site. 



REBASE Enzymes 
04/13/2001 



o 



Si 

W 



10 Type II restriction enzymes with asymmetric recognition sequences; 



15 



20 



25 



30 



35 



40 



45 



Enzymes 


Recognition Sequence 


Isoschizomers 


Suppliers 


Aarl 


CACCT GCNNNN A NNNN 


- 


y 


Acelll 


CAGCT CNNNNNNN A NNNN 


_ 




Bbr7I 


GAAGACNNNNNNN A NNNN 


_ 


_ 


Bbvl 


GCAGCNNNNNNNN A NNNN 




y 


BbvII 


GAAGACNN A NNNN 






Bce83I 


CTTGAGNNNNNNNNNNNNNN NN A 


- 


- 


BceAI 


ACGGCNNNNNNNNNNNN A NN 


- 


y 


Bcefl 


ACGGCNNNNNNNNNNNN A N 


- 




BciVI 


GTATCCNNNNN N A 


Bful 


y 


Bfil 


ACTGGGNNNN N A 


Bmrl 


y 


BinI 


GGATCNNNN A N 






BscAI 


GCATCNNNN A NN 




_ 


BseRI 


GAGGAGNNNNNNNN NN A 




y 


BsmFI 


GGGACNNNNNNNNNN A NNNN 


BspLUllIII 


y 


BspMI 


ACCTGCNNNN A NNNN 


Acc36I 


y 


Ecil 


GGCGGANNNNNNNNN NN A 




y 


Eco57I 


CTGAAGNNNNNNNNNNNNNN NN A 


BspKTSI 


y 


Faul 


CCCGCNNNN A NN 


BstFZ438I 


y 


Fokl 


GGAT GNNNNNNNNN A NNNN 


BstPZ418I 


y 


Gsul 


CTGGAGNNNNNNNNNNNNNN NN A 




y 


Hgal 


GACGCNNNNN A NNNNN 




y 


HphI 


GGTGANNNNNNN N A 


AsuHPI 


y 


MboII 


GAAGANNNNNNN N A 




y 


Mlyl 


GAGTCNNNNN A 


SchI 


y 


Mmel 


TCCRACNNNNNNNNNNNNNNNNNN 


_NN A 




Mnll 


CCTCNNNNNN N A 




y 


Plel 


GAGT CNNNN A N 


PpsI 


y 


RleAI 


CCCACANNNNNNNNN NNN A 






SfaNI 


GCATCNNNNN A NNNN 


BspSTSI 


y 


SspD5I 


GGTGANNNNNNNN A 






Sthl32I 


CCCGNNNN A NNNN 






StsI 


GGAT GNNNNNNNNNN A NNNN 






Taqll 


GACCGANNNNNNNNN NN A , CACCCANNNNNNNNN NN A 




Tthlllll 


CAARCANNNNNNNNN NN A 






UbaPI 


CGAACG 







The notation is A means cut the upper strand and _ means cut the lower 
strand. If the upper and lower strand are cut at the same place, then only 
A appears . 



I 

Cn 
fd 

u 

I t7> 



U 




D> 4J cn 4J 

rd 4-> m 4J 

<7> 4J 4-> 

fd (0 rd fd 

U 4-> U ^ 

tr> u u 

-P (0 4-> rd 

4-> 4-> 4J 4-> 



U U 
03 4-> 

4J e> 



O 0 

(0 4-> 

4-> O 

o o 



4-> H 4-> Eh 

U U U U 

<d rd rd fd 

4-> o u u - 

(d tj> rd i 

cn tr> Cn Cn O 

(d <d fd rd nj 

fd rd tr> fd 

4J 4-> 4-> 4-> o 

u u u u fd 

4J tji 4-> tJ> 



i 

< 

H fd 
Eh rd 

— Eh 
O O 
Cn fd 
rd U 

— 4-) 
U 4J 
fd U 
rd & 

— u 
I 



U < 
fd O O 
_fd < < 

U < Eh 
rd Eh U 
tn Eh Eh 

< (J 0> 

»< fd fd 

Eh 

Eh O U 

o fd rd 

Eh fd U 
4J 

4J 
U 

tr> 
o 
i 



00 
00 

CQ 



X 

U 

Cm 
iH 
00 
00 



LO 
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Table 15: Use of Fokl as "Universal Restriction Enzyme" 

Fokl - for dsDNA, | represents sites of cleavage 

sites of cleavage 
5 1 -cacGGATGtg — nnnnnnn | nnnnnnn-3 1 (SEQ ID N0:15) 
5 3 ' -gtgCCTACac — nnnnnnnnnnn | nnn-5 1 ( SEQ ID N0:16) 

RECOG 

NITion of Fokl 

Case I 

5'-. . ,gtg| tatt-actgtgc. .Substrate -3' (SEQ ID N0:17) 

10 3 1 -cac- ataa I tqacacq — > 

qtGTAGGcac\ 
5 1 - caCATCCgtg/ (SEQ ID NO: 18) 

Case II 

u 

□ 5'-. - .gtgtatt I agac-tgc. .Substrate -3' (SEQ ID NO:19) 

p*i 15 |— cacataa-tctg | acg-5 1 

*£ /gtgCCTACac 

*j \cacGGATGtg-3 1 (SEQ ID NO:20) 

ffl Case III (Case I rotated 180 degrees) 

jj! /gtgCCTACac-5' 

20 \ cacGGATGtg-, 
s qtqtctt I acaq-tcc-3 ' Adapter (SEQ ID NO: 21) 

y : 3'-. . . cacagaa-tgtc I agg. .substrate. .. . -5 1 (SEQ ID NO: 22) 



M 



Case IV (Case II rotated 180 degrees) 



y-j 3'- gtGTAGGcac\ (SEQ ID NO: 23) 

p^i 25 [—caCATCCgtg/ 

5 1 -gag | tctc-actqaqc 
Substrate 3 1 - . . . ctc-agag | tgactcg. . .-5 1 (SEQ ID NO:24) 

Improved Fokl adapters 

Fokl - for dsDNA, | represents sites of cleavage 

30 Case I 

Stem 11, loop 5, stem 11, recognition 17 

5 * - . . . catgtg | tatt-actgtgc. . Substrate. . . . -3 1 
3 ' -qtacac- ataa I tqacacq — ^ r T— ^ 

qtGTAGGcacG T 

35 5 f - caCATCCgtgc C 
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Case II 

Stem 10, loop 5, stem 10, recognition 18 

5 1 -. . . gtgtatt | agac-tgctgcc. .Substrate. . . .-3* 
I-T-] [— oacataa-tctg | acgacgg-5 1 
5 T gtgCCTACac 

C cacGGATGtg-3 1 
1_tt-I 



Case III (Case I rotated 180 degrees) 
Stem 11, loop 5, stem 11, recognition 20 

10 r T-, 

T TgtgCCTACac-5 1 
G AcacGGATGtq— [ 

L TTJ qtatctt I acaa-tccattcta-3 ' Adapter 

3'-. . . cacagaa-tgtc | aggtaagac. .substrate. 

s 15 Case IV (Case II rotated 180 degrees) 

r " Stem 11, loop 4, stem 11, recognition 17 



m 
•=? • 

SI 



3'- gtGTAGGcacc T 
j— caCATCCgtgg T 

20 5 ' -atcqaq 1 tctc-actgaqc L T J 

Substrate 3 ' - . . . tagctc-agag | tgactcg. . . -5 1 



BseRI 

I sites of cleavage 
5 • -cacGAGGAGnnnnnnnnnn | nnnnn-3 1 
25 3 1 -gtgctcctcnnnnnnnn | nnnnnnn-5 1 

RECOG 



J! NITion of BseRI 

yi 

E5 Stem 11, loop 5, stem 11, recognition 19 

3 ' - gaacat | cg-ttaagccagta 5 1 

30 (-T-T-) cttgta-gclaattcggtcat-3 1 

C GCTGAGGAGTC-- ' 

T cgactcctcag-5 ' An adapter for BseRI to cleave the substrate above. 
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What happens in the top strand: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



| site of cleavage in the tipper strand 
(VL133-2a2*) 5'-g tct cct g| ga cag teg ate 

(VL133-31*) 5'-g gec ttg g| ga cag aca gtc 

(VL133-2c*) 5'-g tct cct g| ga cag tea gtc 

(VL133-lc*) 5'-g gec cca g | gg cag agg gtc 

The following Extenders and Bridges all encode the AA sequence of 2a2 for codons 1-15 
1 

;ON_LamExl33) S'-ccTcTgAcTgAgT gcA cAg - 

2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 

13 14 15 
tcCccGg! 2a2 

1 

(ON_LamBl-133) \RQ 5'-ccTcTgAcTgAgT gcA cAg - 

2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 

13 14 15 

tcC ccG g ga cag teg at-3* ! 2a2 : N>B> the actual seq is the 
reverse complement of the 
one shown;; 

(ON_LamB2-133) [RQ 5'-ccTcTgAcTgAgT gcA cAg - 

2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 

13 14 15 

tcC ccG g ga cag aca gt-3' ! 31 N.B. the actual seq is the 
reverse complement of the 
one shown. 



(ON_LamB3-133) \RQ S'-ccTcTgAcTgAgT gcAcAg- 

2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 

13 14 15 

tcC ccG g ga cag tea gt -3'! 2c N.B. the actual seq ts the 
reverse complement of the 

• one shownij 

(ON_LamB4-133) [RC] 5 ' -ccTcTgAcTgAgT gcA cAg 
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! 2 3 4 5 6 7 8 9 10 11 12 

AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT-s 

t 

! 13 14 15 



tcC 


ccG g < 




cag 


agg gt-3 f 


! lc 


N*B. the actual 


seq is ttie% 


j 




























complement of the 


























lllill 


shown'.. 



I 



(ON_Laml33PCR) 5 1 -ccTcTgAcTgAgT gcA cAg AGt gc-3 f 
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Table 19: 
Enzyme 



Cleavage of 75 
Recognition* 



human light chains . 

Nch Ns Planned location of site 



m 
m 
si 

£?! 

W 
HI 
C: 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Afel 


AGCgct 


0 


0 




Aflll 


Cttaag 


0 


0 


HC FR3 


Age I 


Accggt 


0 


0 




AscI 


GGcgcgcc 


0 


0 


After LC 


Bglll 


Agatct 


0 


0 




BsiWI 


Cgtacg 


0 


0 




BspDI 


ATcgat 


0 


0 




BssHII 


Gcgcgc 


0 


0 




BstBI 


TTcgaa 


0 


0 




Dralll 


CACNNNgtg 


0 


0 




EagI 


Cggccg 


0 


0 




Fsel 


GGCCGGcc 


0 


0 




Fspl 


TGCgca 


0 


n 

0 




Hpal 


GTTaac 


0 


0 




Mfel 


Caattg 


0 


0 


HC FR1 


Mlul 


Acgcgt 


0 


0 




Ncol 


Ccatgg 


0 


0 


Heavy chain signal 


Nhel 


Gctagc 


0 


0 


HC/ anchor linker 


NotI 


GCggccgc 


0 


0 


In linker after HC 


Nrul 


TCGcga 


0 


0 




Pad 


TTAATtaa 


0 


0 




Pmel 


GTTTaaac 


0 


0 




Pmll 


CACgtg 


0 


0 




Pvul 


CGATcg 


0 


0 




SacII 


CCGCgg 


0 


0 




Sail 


Gtcgac 


0 


0 




Sfil 


GGCCNNNNnggcc 


0 


0 


Heavy Chain signal 


Sgfl 


GCGATcgc 


0 


0 




SnaBI 


TACgta 


0 


0 




StuI 


AGGcct 


0 


0 




Xbal 


Tctaga 


0 


0 


HC FR3 


Aatll 


GACGTc 


1 


1 




Acll 


AAcgtt 


1 


1 




Asel 


ATtaat 


1 


1 




BsmI 


GAATGCN 


1 


1 




BspEI 


Tccgga 


1 


1 


HC FR1 


BstXI 


CCANNNNNntgg 


1 


1 


HC FR2 


DrdI 


GACNNNNnngtc 


1 


1 




Hindlll 


Aagctt 


1 


1 




Pcil 


Acatgt 


1 


1 




Sapl 


gaagagc 


1 


1 




Seal 


AGTact 


1 


1 




SexAI 


Accwggt 


1 


1 




Spel 


Actagt 


1 


1 




Tlil 


Ctcgag 


1 


1 




Xhol 


Ctcgag 


1 


1 




Bcgl 


egannnnnntge 


2 


2 




BlpI 


GCtnagc 


2 


2 




BssSI 


Ctcgtg 


2 


2 




BstAPI 


GCANNNNntgc 


2 


2 




Espl 


GCtnagc 


2 


2 




KasI 


Ggcgcc 


2 


2 




PflMI 


CCANNNNntgg 


2 


2 




XmnI 


GAANNnnttc 


2 


2 




ApaLI 


Gtgcac 


3 


3 


LC signal seq 
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4" 

111 

en 

5 

g 

Ui 

0 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Nael 


GCCggc 


3 


3 


NgoMI 


Gccggc 


3 


3 


PvuII 


CAGctg 


3 


3 


RsrII 


CGgwccg 


3 


3 


BsrBI 


GAGcgg 


4 


4 


BsrDI 


GCAATGNNn 


4 


4 


BstZ17I 


GTAtac 


4 


4 


EcoRI 


Gaattc 


4 


4 


SphI 


GCATGc 


4 


4 


Sspl 


AATatt 


4 


4 


AccI 


GTmkac 


5 


5 


Bell 


Tgatca 


5 


5 


BsmBI 


Nnnnnngagacg 


5 


5 


BsrGI 


Tgtaca 


5 


5 


Dral 


TTTaaa 


6 


6 


Ndel 


CAtatg 


6 


6 


Swal 


ATTTaaat 


6 


6 


BamHI 


Ggatcc 


7 


7 


Sad 


GAGCTc 


7 


7 


BciVI 


GTATCCNNNNNN 


8 


8 


BsaBI 


GATNNnnatc 


8 


8 


Nsil 


ATGCAt 


8 


8 


Bspl20l 


Gggccc 


9 


9 


Apal 


GGGCCc 


9 


9 


PspOOMI 


Gggccc 


9 


9 


BspHI 


Tcatga 


9 


11 


EcoRV 


GATatc 


9 


9 


Ahdl 


GACNNNnngtc 


11 


11 


Bbsl 


GAAGAC 


11 


14 


Psil 


TTAtaa 


12 


12 


Bsal 


GGTCTCNnnnn 


13 


15 


Xmal 


Cccggg 


13 


14 


Aval 


Cycgrg 


14 


16 


Bgll 


GCCNNNNnggc 


14 


17 


AlwNI 


CAGNNNctg 


16 


16 


BspMI 


ACCTGC 


17 


19 


Xcml 


CCANNNNNnnnntgg 


17 


26 


BstEII 


Ggtnacc 


19 


22 


Sse8387I 


CCTGCAgg 


20 


20 


Avrll 


Cctagg 


22 


22 


Hindi 


GTYrac 


22 


22 


Bsgl 


GTGCAG 


27 


29 


MscI 


TGGcca 


30 


34 


BseRI 


NNnnnnnnnnctcctc 


32 


35 


Bsu36I 


CCtnagg 


35 


37 


PstI 


CTGCAg 


35 


40 


Ecil 


nnnnnnnnntccgcc 


38 


40 


PpuMI 


RGgwccy 


41 


50 


Styl 


Ccwwgg 


44 


73 


EcoO109I 


RGgnccy 


46 


70 


Acc65I 


Ggtacc 


50 


51 


Kpnl 


GGTACc 


50 


51 


Bpml 


ctccag 


53 


82 


Avail 


Ggwcc 


71 


124 



CHI 
CHI 



HC FR4 



55 



* cleavage occurs in the top strand after the last upper-case base. For REs 
that cut palindromic sequences, the lower strand is cut at the symmetrical 
site. 
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Table 20: Cleavage of 79 human heavy chains 



S 

W 
S3* 

in 

PI 
^! 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Enzvme 


Recoanition 


Nch 


Ns 


Afel 


AGCgct 


0 


0 


Aflll 


Cttaag 


0 


0 


AscI 


GGcgcgcc 


0 


0 


BsiWI 


Cgtacg 


0 


0 


BspDI 


ATcgat 


0 


0 


BssHII 


Gcgcgc 


0 


0 


Fsel 


GGCCGGcc 


0 


0 


Hpal 


GTTaac 


0 


0 


Nhel 


Gctagc 


0 


0 


NotI 


GCggccgc 


0 


0 


Nrul 


TCGcga 


0 


0 


Nsil 


ATGCAt 


0 


0 


Pad 


TTAATtaa 


0 


0 


Pcil 


Acatgt 


0 


0 


Pmel 


GTTTaaac 


0 


0 


Pvul 


CGATcg 


0 


0 


RsrII 


CGgwccg 


0 


0 


Sapl 


gaagagc 


0 


0 


Sfil 


GGCCNNNNnggcc 


0 


0 


Sgfl 


GCGATcgc 


0 


0 


Swal 


ATTTaaat 


0 


0 


Acll 


AAcgtt 


1 


1 


Age I 


Accggt 


1 


1 


As el 


ATtaat 


1 


1 


Avrll 


Cctagg 


1 


1 


BsmI 


GAATGCN 


1 


1 


BsrBI 


GAGcgg 


1 


1 


BsrDI 


GCAATGNNn 


1 


1 


Dral 


TTTaaa 


1 


1 


Fspl 


TGCgca 


1 


1 


Hindlll 


Aagctt 


1 


1 


Mfel 


Caattg 


1 


1 


Nael 


GCCggc 


1 


1 


NgoMI 


Gccggc 


1 


1 


Spel 


Actagt 


1 


1 


Acc65I 


Ggtacc 


2 


2 


BstBI 


TTcgaa 


2 


2 


Kpnl 


GGTACc 


2 


2 


Mlul 


Acgcgt 


2 


2 


Ncol 


Ccatgg 


2 


2 


Ndel 


CAtatg 


2 


2 


Pmll 


CACgtg 


2 


2 


Xcml 


CCANNNNNnnnntgg 


2 


2 


Bcgl 


egannnnnntge 


3 


3 


Bell 


Tgatca 


3 


3 


Bgll 


GCCNNNNnggc 


3 


3 


BsaBI 


GATNNnnatc 


3 


3 


BsrGI 


Tgtaca 


3 


3 


SnaBI 


TACgta 


3 


3 


Sse8387I 


CCTGCAgg 


3 


3 


ApaLI 


Gtgcac 


4 


4 


BspHI 


Tcatga 


4 


4 


BssSI 


Ctcgtg 


4 


4 


Psil 


TTAtaa 


4 


5 



Planned location of site 



HC FR3 
After LC 



HC Linker 

In linker, HC/anchor 



HC signal seq 



In HC signal seq 
HC FR4 



LC Signal/FRl 
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SphI GCATGc 4 

Ahdl GACNNNnngtc 5 

BspEI Tccgga 5 

MscI TGGcca 5 

5 SacI GAGCTc 5 

Seal AGTact 5 

SexAI Accwggt 5 

Sspl AATatt 5 

Tlil Ctcgag 5 

10 Xhol Ctcgag 5 

Bbsl GAAGAC 7 

BstAPI GCANNNNntgc 7 

BstZ17I GTAtac 7 

EcoRV GATatc 7 

15 EcoRI Gaattc 8 

BlpI GCtnagc 9 

Bsu36I CCtnagg 9 

Drain CACNNNgtg 9 

Espl GCtnagc 9 

20 StuI AGGcct 9 

Xbal Tctaga 9 

Bspl20I Gggccc 10 

Apal GGGCCc 10 

PspOOMI Gggccc 10 

25 BciVI GTATCCNNNNNN 11 

Sail Gtcgac 11 

DrdI GACNNNNnngtc 12 

KasI Ggcgcc 12 

Xmal Cccggg 12 

30 Bglll Agatct 14 

Hindi GTYrac 16 

BamHI Ggatcc 17 

PflMI CCANNNNntgg 17 

BsmBI Nnnnnngagacg 18 

35 BstXI CCANNNNNntgg 18 

Xmnl GAANNnnttc 18 

SacII CCGCgg 19 

PstI CTGCAg 20 

PvuII CAGctg 20 

40 Aval Cycgrg 21 

EagI Cggccg 21 

Aatll GACGTc 22 

BspMI ACCTGC 27 

AccI GTmkac 30 

45 Styl Ccwwgg 36 

AlwNI CAGNNNctg 38 

Bsal GGTCTCNnnnn 38 

PpuMI RGgwccy 43 

Bsgl GTGCAG 44 

50 BseRI NNnnnnnnnnctcctc 48 

Ecil nnnnnnnnntccgcc 52 

BstEII Ggtnacc 54 

EcoO109I RGgnccy 54 

Bpml ctccag 60 

55 Avail Ggwcc 71 



4 
5 

5 HC FR1 

5 
5 
5 
6 
5 
5 
5 
8 
8 
7 
7 
8 
9 
9 
9 
9 
13 

9 HC FR3 
11 CHI 
11 CHI 

11 
11 
12 
12 
12 
14 
14 
18 
17 
18 
21 

19 HC FR2 

18 
19 
24 
22 
24 
22 
22 
33 
43 
49 
44 
44 
46 
54 
60 
57 

61 HC Fr4, 47/79 have one 

86 
121 
140 
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Table 21: MALIA3, annotated 
! MALI A3 9532 bases 



1 aat get act act att agt aga att gat gec acc ttt tea get cgc gee 
5 ! gene ii continued 

49 cca aat gaa aat ata get aaa cag gtt att gac cat ttg cga aat gta 
97 tct aat ggt caa act aaa tct act cgt teg cag aat tgg gaa tea act 
145 gtt aca tgg aat gaa act tec aga cac cgt act tta gtt gca tat tta 
193 aaa cat gtt gag eta cag cac cag att cag caa tta age tct aag cca 
10 241 tec gca aaa atg acc tct tat caa aag gag caa tta aag gta etc tct 
289 aat cct gac ctg ttg gag ttt get tec ggt ctg gtt cgc ttt gaa get 
337 cga att aaa acg cga tat ttg aag tct ttc ggg ctt cct ctt aat ctt 
385 ttt gat gca ate cgc ttt get tct gac tat aat agt cag ggt aaa gac 
433 ctg att ttt gat tta tgg tea ttc teg ttt tct gaa ctg ttt aaa gca 
15 481 ttt gag ggg gat tea ATG aat att tat gac gat tec gca gta ttg gac 
! RBS? Start gene x, ii continues 



5 20 

•spv 
P 

m 

«P 25 ! 

M ! 

H : 30 ! 

w 

m ; 

O 35 



! 
I 

I 

40 ! 

j 
l 

45 ! 

| 

50 ! 

I 
f 



55 ! 
i 



529 


' get 


ate 


cag 


tct 


aaa 


cat 


ttt 


act 


att 


acc 


ccc 


tct 


ggc 


aaa 


act 


tct 




Off 


ttt 


gca 


aaa 


gee 


tcc 


cgc 


cat 


4-4-4- 

ttt 


ggt 


ttt 


tat 


cgt 


cgt 


ctg 


gta 


aac 




625 


gag 


ggt 


tat 


gat 


agt 


gtt 


get 


Ctt 


act 


atg 


cct 


cgt 


aat 


tec 


ttt 


tgg 




673 


cgt 


tat 


gta 


tct 


gca 


tta 


gtt 


gaa 


tgt 


ggt 


att 


cct 


aaa 


tct 


caa 


ctg 




/Zl 


atg 


aat 


ctt 


tct 


acc 




aa u 


aac 


gtt 


gtt 


ccg 


tta 


gtt 


cgt 


4-4-4- 
LLL 






769 


aac 


gta 


gat 


ttt 


tct 


tec 


caa 


cgt 


cct 


gac 


tgg 


tat 


aat 


gag 


cca 


gtt 




0 1 / 


ctt 


aaa 


ate 


gca 


1 AA. 




































End 


X & 


II 






















832 


ggtaattca ca 






























Ml 








E5 










Q10 










T15 






843 


ATG 


att 


aaa 


gtt 


gaa 


att 


aaa 


cca 


tct 


caa 


gee 


caa 


ttt 


act 


act 


cgt 






Start gene V 






























S17 






S20 










P25 










E30 








891 


tct 


ggt 


gtt 


tct 


cgt 


cag 


ggc 


aag 


cct 


tat 


tea 


ctg 


aat 


gag 


cag 


ctt 










V35 










E40 










V45 










939 


tgt 


tac 


gtt 


gat 


ttg 


ggt 


aat 


gaa 


tat 


ccg 


gtt 


ctt 


gtc 


aag 


att 


act 








D50 










A55 










L60 












987 


ctt 


gat 


gaa 


ggt 


cag 


cca 


gee 


tat 


gcg 


cct 


ggt 


cTG 


TAC 


Acc 


gtt 


cat 




























BsrGI . . 












L65 










V70 










S75 










R80 




1035 


ctg 


tec 


tct 


ttc 


aaa 


gtt 


ggt 


cag 


ttc 


ggt 


tec 


ctt 


atg 


att 


gac 


cgt 














P85 




K87 


end 


of V 
















1083 


ctg 


cgc 


etc 


gtt 


ccg 


get 


aag 


TAA 


C 


















1108 


ATG 


gag 


cag 


gtc 


gcg 


gat 


ttc 


gac 


aca 


att 


tat 


cag 


gcg 


atg 










Start gene VII 




























1150 


ata 


caa 


ate 


tec 


gtt 


gta 


ctt 


tgt 


ttc 


gcg 


ctt 


ggt 


ata 


ate 


















VII and IX overlap. 






























S2 


V3 


L4 


V5 








S10 






1192 


get 


ggg 


ggt 


caa 


agA TGA gt gtt tta gtg tat tct ttc gee tct ttc gtt 














End 


VII 
































I start IX 
























L13 




W15 










G20 










T25 








E29 


1242 


tta 


ggt 


tgg 


tgc 


ctt 


cgt 


agt 


ggc 


att 


acg 


tat 


ttt 


acc 


cgt 


tta 


atg 


gaa 
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w 
4;; 

in 

01 
M 

2 

N< 

W 

It) 

tri 

w 
M 



10 



15 



20 



25 



30 



35 



40 



45 



1293 act tec tc 

.... stop of IX, IX and VIII overlap by four bases 
1301 ATG aaa aag tct tta gtc etc aaa gec tct gta gec gtt get ace etc 
Start signal sequence of viii. 

1349 gtt ccg atg ctg tct ttc get get gag ggt gac gat ccc gca aaa gcg 

mature VIII > 

1397 gec ttt aac tec ctg caa gec tea gcg ace gaa tat ate ggt tat gcg 
1445 tgg gcg atg gtt gtt gtc att 

1466 gtc ggc gca act ate ggt ate aag ctg ttt aag 
1499 aaa ttc ace teg aaa gca ! 1515 



-35 



1517 



age tga taaaccgat acaattaaag gctccttttg 
-10 



1552 gagecttttt ttttGGAGAt ttt ! S.D. underlined 



< III signal sequence 

MKKLLFAI PLV 
1575 caac GTG aaa aaa tta tta ttc gca att cct tta gtt ! 1611 

VPFYSHSAQ 
1612 gtt cct ttc tat tct cac aGT gcA Cag tCT 

ApaLI . . . 



1642 



1729 
1777 
1825 
1870 
1900 
1930 

1969 
2002 
2050 
2098 
2146 
2194 
2242 
2290 



GTC GTG ACG CAG CCG CCC TCA GTG TCT GGG GCC CCA GGG CAG 
AGG GTC ACC ATC TCC TGC ACT GGG AGC AGC TCC AAC ATC GGG GCA 
BstEII. . . 

GGT TAT GAT GTA CAC TGG TAC CAG CAG CTT CCA GGA ACA GCC CCC AAA 
CTC CTC ATC TAT GGT AAC AGC AAT CGG CCC TCA GGG GTC CCT GAC CGA 
TTC TCT GGC TCC AAG TCT GGC ACC TCA GCC TCC CTG GCC ATC ACT 
GGG CTC CAG GCT GAG GAT GAG GCT GAT TAT 
TAC TGC CAG TCC TAT GAC AGC AGC CTG AGT 
GGC CTT TAT GTC TTC GGA ACT GGG ACC AAG GTC ACC GTC 

BstEII. . . 

CTA GGT CAG CCC AAG GCC AAC CCC ACT GTC ACT 

CTG TTC CCG CCC TCC TCT GAG GAG CTC CAA GCC AAC AAG GCC ACA CTA 
GTG TGT CTG ATC AGT GAC TTC TAC CCG GGA GCT GTG ACA GTG GCC TGG 
AAG GCA GAT AGC AGC CCC GTC AAG GCG GGA GTG GAG ACC ACC ACA CCC 
TCC AAA CAA AGC AAC AAC AAG TAC GCG GCC AGC AGC TAT CTG AGC CTG 
ACG CCT GAG CAG TGG AAG TCC CAC AGA AGC TAC AGC TGC CAG GTC ACG 
CAT GAA GGG AGC ACC GTG GAG AAG ACA GTG GCC CCT ACA GAA TGT TCA 
TAA TAA ACCG CCTCCACCG G GCGCGCCA AT TCTATTTCAA GGAGACAGTC ATA 

AscI 



50 



2343 



PelB signal > 

MKYLLPTAAAGLLLL 
ATG AAA TAC CTA TTG CCT ACG GCA GCC GCT GGA TTG TTA TTA CTC 



16 17 18 19 20 
A A Q P A 
55 2388 gcG GCC cag ccG G CC 

Sfil 

NgoMI. . . (1/2) 
Ncol. . 



21 22 
M A 
atg g ec 
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FRl (DP47/V3-23) 

23 24 25 26 27 28 29 30 
EVQLLESG 
24 09 gaa | gtt | CAA| TTG | tta | gag | tct | ggt | 

| Mfel | 



FR1 

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
GGLVQPGGSLRLSCA 
24 33 | ggc | ggt | ctt | gtt j cag | cct | ggt [ ggt | tct | tta | cgt | ctt | tct | tgc | get | 

FRl > | . . . CDR1 | FR2 

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
ASGFTFS SYAMSWVR 
2478 | get | TCC j GGA| ttc [ act | ttc | tct | tCG | TAC | Get | atg | tct | tgg | gtt | cgC | 

I BspEI | | BsiWII |BstXI. 

FR2 >| . . . CDR2 

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
QAPGKGLEWVSAISG 
2523 | CAa | get | ccT | GGt | aaa | ggt | ttg | gag | tgg | gtt | tct | get | ate | tct | ggt | 
...BstXI | 



2568 



CDR2 I FR3 

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
SGGSTYYADSVKGRF 
| tct | ggt j ggc | agt | act | tac | tat I get | gac | tec | gtt | aaa | ggt | cgc | ttc | 



FR3 

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
TISRDNSKNTLYLQM 
2 613 | act | ate | TCT | AGA | gac | aac | tct 1 aag | aat | act | etc | tac | ttg | cag | atg | 
I Xbal | 



FR3 >| 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
NSLRAEDTAVYYCAK 
2658 1 aac | agC | TTA | AGg | get | gag | gac | aCT | GCA| Gtc | tac | tat 1 tgc | get I aaa | 
lAflll | | PstI | 



CDR3 | FR4 

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
DYEGTGYAFDIW.GQG 
27 03 | gac | tat | gaa j ggt | act | ggt | tat | get | ttc | gaC | ATA| TGg | ggt [ caa j ggt | 

| Ndel | (1/4) 



FR4 >| 

136 137 138 139 140 141 142 
T M V T V S S 
274 8 | act | atG | GTC | ACC | gtc | tct | agt 
I BstEII | 

From BstEII onwards, pV323 is same as pCESl, except as noted. 

BstEII sites may occur in light chains; not likely to be unique in final 

vector. 



143 144 145 146 147 148 149 150 151 152 
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2769 



ASTKGPSVFP 
gcc tec acc aaG GGC CCa teg GTC TTC ccc 
Bspl20I. BbsI...(2/2) 
Apal .... 



10 



15 



20 



25 



30 



35 



40 



45 



153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 
LAPSSKSTSGGTAAL 
27 99 ctg gca ccC TCC TCc aag age acc tct ggg ggc aca gcg gcc ctg 
BseRI. . . (2/2) 



2844 



2889 



2934 



2979 



3024 



3069 



3111 



3153 



168 


169 


170 


171 


172 


173 


174 


175 


176 


177 


178 


179 


180 


181 


182 


G 


C 


L 


V 


K 


D 


Y 


F 


P 


E 


P 


V 


T 


V 


S 


ggc 


tgc 


ctg 


GTC 


AAG 


GAC 


TAC 


TTC 


CCc 


gaA 


CCG 


GTg 


acg 


gtg 


teg 




















Agel . 










183 


184 


185 


186 


187 


188 


189 


190 


191 


192 


193 


194 


195 


196 


197 


W 


N 


S 


G 


A 


L 


T 


S 


G 


V 


H 


T 


F 


P 


A 


tgg 


aac 


tea 


GGC 


GCC 


ctg 


acc 


age 


ggc 


gtc 


cac 


acc 


ttc 


ccg 


get 








Kasl. . . 


(1/4) 




















198 


199 


200 


201 


202 


203 


204 


205 


206 


207 


208 


209 


210 


211 


212 


V 


L 


Q 


S 


S 


G 


L 


Y 


S 


L 


S 


S 


V 


V 


T 


gtc 


eta 


cag 


tct 


age 


GGa 


etc 


tac 


tec 


etc 


age 


age 


gta 


gtg 


acc 








(Bsu36I 


. . . ) (knocked 


out) 














213 


214 


215 


216 


217 


218 


219 


220 


221 


222 


223 


224 


225 


226 


227 


V 


P . 


S 


S 


S 


L 


G 


T 


Q 


T 


Y 


I 


C 


N 


V 


gtg 


ccC 


tct 


tct 


age 


tTG 


Ggc 


acc 


cag 


acc 


tac 


ate 


tgc 


aac 


gtg 






(BstXI. . 






.)N.B. destruction of BstXI 


& Bpml 


228 


229 


230 


231 


232 


233 


234 


235 


236 


237 


238 


239 


240 


241 


242 


N 


H 


K 


P 


S 


N 


T 


K 


V 


D 


K 


K 


V 


E 


P 


aat 


cac 


aag 


CCC 


age 


aac 


acc 


aag 


gtg 


gac 


aag 


aaa 


gtt 


gag 


CCC 


243 


244 


245 


























K 


S 


C 


A 


A 


A 


H 


H 


H 


H 


H 


H 


S 


A 




aaa 


tct 


tgt 


GCG 


GCC 


GCt 


cat 


cac 


cac 


cat 


cat 


cac 


tct 


get 










Notl. . . 






















E 


Q 


K 


L 


I 


S 


E 


E 


D 


L 


N 


G 


A 


A 




gaa 


caa 


aaa 


etc 


ate 


tea 


gaa 


gag 


gat 


ctg 


aat 


ggt 


gcc 


gca 




D 


I 


N 


D 


D 


R 


M 


A S G A 








GAT 


ATC 


aac 


gat 


gat 


cgt 


atg 


get AGC ggc 


gcc 









rEK cleavage site. 
EcoRV. . 



Nhel... Kasl. 



Domain 1 



50! AETVESCLA 
3183 get gaa act gtt gaa agt tgt tta gca 



KPHTEISF 
55 3210 aaa ccc cat aca gaa aat tea ttt 



TNVWKDDKT 
3234 aCT AAC GTC TGG AAA GAC GAC AAA ACt 
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LDRYANYEGCLWNATGV 
3261 tta gat cgt tac get aac tat gag ggt tgt ctg tgG AAT GCt aca ggc gtt 

BsmI 



n. i 



Hi 
* v: 

W 



10 



15 



20 



25 



30 



35 



40 



VVCTGDETQCYGTWVPI 
3312 gta gtt tgt act ggt GAC GAA ACT CAG TGT TAC GGT ACA TGG GTT cct att 

G L A I PEN 
3363 ggg ctt get ate cct gaa aat 

LI linker 

EGGGSEGGGS 
3384 gag ggt ggt ggc tct gag ggt ggc ggt tct 

EGGGS EGGGT 
3414 gag ggt ggc ggt tct gag ggt ggc ggt act 



Domain 2 

3444 aaa cct 
3495 cct etc 
3546 aat cct 

3597 aat agg 
3645 gtt act 
3693 gta tea 

3741 GAC TGc 

AlwNI 
3789 tat caa 



3834 ggc ggc ggc tct 

start L2 

3846 ggt ggt ggt tct 
3858 ggt ggc ggc tct 

3870 gag ggt ggt ggc tct gag ggt ggc ggt tct 
3900 gag ggt ggc ggc tct gag gga ggc ggt tec 

! end L2 



cct 


gag 


tac 


ggt 


gat 


aca 


cct 


att 


ccg 


ggc 


tat 


act 


tat 


ate 


aac 


gac 


ggc 


act 


tat 


ccg 


cct 


ggt 


act 


gag 


caa 


aac 


ccc 


get 


aat 


cct 


tct 


ctt 


GAG 


GAG 


tct 


cag 


cct 


ctt 


aat 


act 


ttc 


atg 


ttt 


cag 


aat 






BseRI 
























ttc 


cga 


aat 


agg 


cag 


ggg 


gca 


tta 


act 


gtt 


tat 


a eg 


ggc 


act 




caa 


ggc 


act 


gac 


ccc 


gtt 


aaa 


act 


tat 


tac 


cag 


tac 


act 


cct 




tea 


aaa 


gee 


atg 


tat 


gac get 


tac 


tgg 


aac 


ggt 


aaa 


ttc 


AGA 




























AlwNI 




get 


ttc 


cat 


tct 


ggc 


ttt 


aat 


gaa 


gat 


cca 


ttc 


gtt 


tgt 


gaa 




ggc 


caa 


teg 


tct 


gac 


ctg 


cct 


caa 


cct 


cct 


gtc 


aat 


get 







3930 ggt ggt ggc tct ggt 
Domain 3 







S 


G 


D 


F 


D 


Y 


E 


K 


M 


A 


N 


A 


N 


K 


G 


A 




3945 


tec 


ggt 


gat 


ttt 


gat 


tat 


gaa 


aag 


atg 


gca 


aac 


get 


aat 


aag 


ggg 


get 






M 


T 


E 


N 


A 


D 


E 


N 


A 


L 


Q 


S 


D 


A 


K 


G 


45 


3993 


atg 


acc 


gaa 


aat 


gee 


gat 


gaa 


aac 


gcg 


eta 


cag 


tct 


gac 


get 


aaa 


ggc 






K 


L 


D 


S 


V 


A 


T 


D 


Y 


G 


A 


A 


I 


D 


G 


F 




4041 


aaa 


ctt 


gat 


tct 


gtc 


get 


act 


gat 


tac 


ggt 


get 


get 


ate 


gat 


ggt 


ttc 


50 




I 


G 


D 


V 


S 


G 


L 


A 


N 


G 


N 


G 


A 


T 


G 


D 




4089 


att 


ggt 


gac 


gtt 


tec 


ggc 


ctt 


get 


aat 


ggt 


aat 


ggt 


get 


act 


ggt 


gat 






F 


A 


G 


S 


N 


S 


Q 


M 


A 


Q 


V 


G 


D 


G 


D 


N 




4137 


ttt 


get 


ggc 


tct 


aat 


tec 


caa 


atg 


get 


caa 


gtc 


ggt 


gac 


ggt 


gat 


aat 


55 








































S 


P 


L 


M 


N 


N 


F 


R 


Q 


Y 


L 


P 


S 


L 


P 


Q 




4185 


tea 


cct 


tta 


atg 


aat 


aat 


ttc 


cgt 


caa 


tat 


tta 


cct 


tec 


etc 


cct 


caa 
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S 


V 


E C 


R 


P 


F 


V 


F 


S 


A 


G 


K 


P 


Y 


E 


4233 


teg 


gtt 


gaa tgt 


cgc 


cct 


ttt 


ate 


ttt 


age 


get 


ggt 


aaa 


cca 


tat 


aaa 




F 


S 


I D 


C 


D 


K 


I 


N 


L 


F 


R 










4281 


ttt 


tct 


att gat 


tgt 


gac 


aaa 


ata 


aac 


tta 


ttc 


cat 
































End 


Domain 


3 






G 


V 


F A 


F 


L 


L 


Y 


V 


A 


T 


F 


M 


Y 


V 


F14I 


4317 


ggt 


gtc 


ttt gcg 


ttt 


ctt 


tta 


tat 


gtt 


gee 


ace 


ttt 


atg 


tat 


gta 


ttt 




start transmembrane 


segment 






















S 


T 


F A 


N 


I 


L 




















4365 


tct 


acg 


ttt get 


aac 


ata 


ctg 






















R 


N 


K E 


S 
























4386 


cgt 


aat 


aag gag 


tct 


TAA 


! stop of iii 














Intracellular anchor. 


























Ml 


P2 V 


L 


L5 


G 


I 


P 


L 


L10 


L 


R 


F 


L 


G15 


4404 


tc 


ATG 


cca gtt 


ctt 


ttg 


ggt 


att 


ccg 


tta 


tta 


ttg 


cgt 


ttc 


etc 


ggt 






Start VI 


























4451 


ttc 


ctt 


ctg gta 


act 


ttg 


ttc 


ggc 


tat 


ctg 


ctt 


act 


ttt 


ctt 


aaa 


aag 


4499 


ggc 


ttc 


ggt aag 


ata 


get 


att 


get 


att 


tea 


ttg 


ttt 


ctt 


get 


ctt 


att 


4547 


att 


ggg 


ctt aac 


tea 


att 


ctt 


gtg 


ggt 


tat 


etc 


tct 


gat 


att 


age 


get 


4595 


caa 


tta 


ccc tct 


gac 


ttt 


gtt 


cag 


ggt 


gtt 


cag 


tta 


att 


etc 


ccg 


tct 


4643 


aat 


gcg 


ctt ccc 


tgt 


ttt 


tat 


gtt 


att 


etc 


tct 


gta 


aag 


get 


get 


att 


4691 


ttc 


att 


ttt gac 


gtt 


aaa 


caa 


aaa 


ate 


gtt 


tct 


tat 


ttg 


gat 


tgg 


gat 








Ml A2 V3 


F5 








L10 




G13 



4739 aaa TAA t ATG get gtt tat ttt gta act ggc aaa tta ggc tct gga 
end VI Start gene I 





14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 




K 


T 


L 


V 


S 


V 


G 


K 


I 


Q 


D 


K 


I 


V 


A 


4785 


aag 


acg 


etc 


gtt 


age 


gtt 


ggt 


aag 


att 


cag 


gat 


aaa 


att 


gta 


get 




29 


30 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


41 


42 


43 




G 


C 


K 


I 


A 


T 


N 


L 


D 


L 


R 


L 


Q 


N 


L 


4830 


ggg 


tgc 


aaa 


ata 


gca 


act 


aat 


ctt 


gat 


tta 


agg 


ctt 


caa 


aac 


etc 




44 


45 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


56 


57 


58 




p 


Q 


V 


G 


R 


F 


A 


K 


T 


P 


R 


V 


L 


R 


I 


4875 


ccg 


caa 


gtc 


ggg 


agg 


ttc 


get 


aaa 


acg 


cct 


cgc 


gtt 


ctt 


aga 


ata 




59 


60 


61 


62 


63 


64 


65 


66 


67 


68 


69 


70 


71 


72 


73 




P 


D 


K 


p 


S 


I 


S 


D 


L 


L 


A 


I 


G 


R 


G 


4920 


ccg 


gat 


aag 


cct 


tct 


ata 


tct 


gat 


ttg 


ctt 


get 


att 


ggg 


cgc 


ggt 




74 


75 


76 


77 


78 


79 


80 


81 


82 


83 


84 


85 


86 


87 


88 




N 


D 


S 


Y 


D 


E 


N 


K 


N 


G 


L 


L 


V 


L 


D 


4965 


aat 


gat 


tec 


tac 


gat 


gaa 


aat 


aaa 


aac 


ggc 


ttg 


ctt 


gtt 


etc 


gat 




89 


90 


91 


92 


93 


94 


95 


96 


97 


98 


99 


100 


101 


102 


103 




E 


C 


G 


T 


W 


F 


N 


T 


R 


S 


W 


N 


D 


K 


E 


5010 


gag 


tgc 


ggt 


act 


tgg 


ttt 


aat 


ace 


cgt 


tct 


tgg 


aat 


gat 


aag 


gaa 



104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 
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a. i 

rn 

U"! 

fSJi. 

w 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 





R 


Q 


P 


I 


I 


D 


W 


F 


L 


H 


A 


R 


K 


T 
L 




5055 


aga 


cag 


ccg 


att 


att 


gat 


tgg 


ttt 


eta 


cat 


get 


cgt 


aaa 


tta 


gga 




119 


120 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


131 


132 


133 




W 


D 


I 


I 


F 


L 


V 


Q 


D 


L 


S 


I 


V 


•n 
u 


V" 

1\ 


5100 


tgg 


gat 


att 


att 


ttt 


ctt 


gtt 


cag 


gac 


tta 


tct 


att 


gtt 


gat 


aaa 




134 


135 


136 


137 


138 


139 


140 


141 


142 


143 


144 


145 


146 


147 


148 




Q 


A 


H 


S 


A 


L 


A 




TT 

n 


V 


V 


I 




K 


D 
K 


5145 


cag 


gcg 


cgt 


tct 


gca 


tta 


get 


gaa 


cat 


gtt 


gtt 


tat 


tgt 


cgt 


cgt 




149 


150 


151 


152 


153 


154 


155 


156 


157 


158 


159 


160 


161 


162 


163 




L 


D 


R 


I 


T 


L 


P 


F 


V 


G 


T 


L 


Y 


e 
o 


T 
J_l 


5190 


ctg 


gac 


aga 


att 


act 


tta 


cct 


ttt 


gtc 


ggt 


act 


tta 


tat 


tct 


Ctt 




164 


165 


166 


167 


168 


169 


170 


171 


172 


173 


174 


175 


176 


177 


178 




I 


T 


G 


S 


K 


M 


P 


L 


P 


K 


L 


H 


V 


G 


V 


5235 


att 


act 


ggc 


teg 


aaa 


atg 


cct 


ctg 


cct 


aaa 


tta 


cat 


gtt 


ggc 


gtt 




179 


180 


181 


182 


183 


184 


185 


186 


187 


188 


189 


190 


191 


192 


193 




V 


K 


Y 


G 


D 


S 


Q 


L 


S 


P 


T 


V 


E 


R 


W 


5280 


gtt 


aaa 


tat 


99 c 


gat 


tct 


caa 


tta 


age 


cct 


act 


gtt 


gag 


cgt 


tgg 




194 


195 


196 


197 


198 


199 


200 


201 


202 


203 


204 


205 


206 


207 


208 




L 


Y 


T 


G 


K 


N 


L 


Y 


N 


A 


Y 


D 


T 


K 


Q 


5325 


ctt 


tat 


act 


ggt 


aag 


aat 


ttg 


tat 


aac 


gca 


tat 


gat 


act 


aaa 


cag 




209 


210 


211 


212 


213 


214 


215 


216 


217 


218 


219 


220 


221 


222 


223 




A 


F 


S 


S 


N 


Y 


D 


S 


G 


V 


Y 


S 


i 


T 
ll 


I 


5370 


get 


ttt 


tct 


agt 


aat 


tat 


gat 


tec 


ggt 


gtt 


tat 


tct 


tat 


tta 


acg 




224 


225 


226 


227 


228 


229 


230 


231 


232 


233 


234 


235 


236 


237 


238 




P 


Y 


L 


S 


H 


G 


K 


v 
I 


rr« 
C 


v 
K 


T5 

r 


h 


XT 


T 

Li 




5415 


cct 


tat 


tta 


tea 


cac 


ggt 


egg 


tat 


ttc 


aaa 


cca 


tta 


aat 


tta 


ggt 




239 


240 


241 


242 


243 


244 


245 


246 


247 


248 


249 


250 


251 


252 


253 




Q 


K 


M 


K 


L 


T 


v 
K 


T 
I 


\r 
1 


T 

L 


K 


K 


TT! 

c 


o 


r> 
K 


5460 


cag 


aag 


atg 


aaa 


tta 


act 


aaa 


ata 


tat 


ttg 


aaa 


aag 


ttt 


tct 


cgc 




254 


255 


256 


257 


258 


259 


260 


261 


262 


263 


264 


265 


266 


267 


268 




V 


L 


C 


L 


A 


I 


G 


F 


A 


S 


A 


F 


T 


Y 


S 


5505 


gtt 


Ctt 


tgt 


ctt 


gcg 


att 


gg a 


ttt 


gca 


tea 


gca 


ttt 


aca 


tat 


agt 




269 


270 


271 


272 


273 


274 


275 


276 


277 


278 


279 


280 


281 


282 


283 




Y 


I 


T 


Q 


P 


K 


P 


E 


V 


K 


K 


V 


V 


S 


Q 


5550 


tat 


ata 


acc 


caa 


cct 


aag 


ccg 


gag 


gtt 


aaa 


aag 


gta 


gtc 


tct 


cag 




284 


285 


286 


287 


288 


289 


290 


291 


292 


293 


294 


295 


296 


297 


298 




T 


Y 


D 


F 


D 


K 


F 


T 


I 


D 


S 


S 


Q 


R 


L 


5595 


acc 


tat 


gat 


ttt 


gat 


aaa 


ttc 


act 


att 


gac 


tct 


tct 


cag 


cgt 


ctt 




299 


300 


301 


302 


303 


304 


305 


306 


307 


308 


309 


310 


311 


312 


313 




N 


L 


S 


Y 


R 


Y 


V 


F 


K 


D 


S 


K 


G 


K 


L 


5640 


aat 


eta 


age 


tat 


cgc 


tat 


gtt 


ttc 


aag 


gat 


tct 


aag 


gga 


aaa 


TTA 



Pad 



314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 
INSDDLQKQGYSLTY 
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5685 ATT AAt age gac gat tta cag aag caa ggt tat tea etc aca tat 
PacI 

329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 
il DLCTVS I KKGNSNE 
iv Ml K 

5730 att gat tta tgt act gtt tec att aaa aaa ggt aat tea aAT Gaa 

Start IV 

344 345 346 347 348 349 
i I V K C N .End of I 
iv L3 L N5 V 17 N F V10 



5775 att gtt aaa tgt aat TAA T TTT GTT 



IV 






























5800 


ttc 


ttg 


atg 


ttt 


gtt 


tea 


tea 


tct 


tct 


ttt 


get 


cag 


gta 


att 


gaa 


atg 


5848 


aat 


aat 


teg 


cct 


ctg 


cgc 


gat 


ttt 


gta 


act 


tgg 


tat 


tea 


aag 


caa 


tea 


5896 


ggc 


gaa 


tec 


gtt 


att 


gtt 


tct 


ccc 


gat 


gta 


aaa 


ggt 


act 


gtt 


act 


gta 


5944 


tat 


tea 


tct 


gac 


gtt 


aaa 


cct 


gaa 


aat 


eta 


cgc 


aat 


ttc 


ttt 


att 


tct 


5992 


gtt 


tta 


cgt 


get 


aat 


aat 


ttt 


gat 


atg 


gtt 


ggt 


tea 


att 


cct 


tec 


ata 


6040 


att 


cag 


aag 


tat 


aat 


cca 


aac 


aat 


cag 


gat 


tat 


att 


gat 


gaa 


ttg 


cca 


6088 


tea 


tct 


gat 


aat 


cag 


gaa 


tat 


gat 


gat 


aat 


tec 


get 


cct 


tct 


ggt 


ggt 


6136 


ttc 


ttt 


gtt 


ccg 


caa 


aat 


gat 


aat 


gtt 


act 


caa 


act 


ttt 


aaa 


att 


aat 


6184 


aac 


gtt 


egg 


gca 


aag 


gat 


tta 


ata 


cga 


gtt 


gtc 


gaa 


ttg 


ttt 


gta 


aag 


6232 


tct 


aat 


act 


tct 


aaa 


tec 


tea 


aat 


gta 


tta 


tct 


att 


gac 


ggc 


tct 


aat 


6280 


eta 


tta 


gtt 


gtt 


TCT 


gca 


cct 


aaa 


gat 


att 


tta 


gat 


aac 


ctt 


cct 


caa 












ApaLI removed 


















6328 


ttc 


ctt 


tct 


act 


gtt 


gat 


ttg 


cca 


act 


gac 


cag 


ata 


ttg 


att 


gag 


ggt 


6376 


ttg 


ata 


ttt 


gag 


gtt 


cag 


caa 


ggt 


gat 


get 


tta 


gat 


ttt 


tea 


ttt 


get 


6424 


get 


ggc 


tct 


cag 


cgt 


ggc 


act 


gtt 


gca 


ggc 


ggt 


gtt 


aat 


act 


gac 


cgc 


6472 


etc 


ace 


tct 


gtt 


tta 


tct 


tct 


get 


ggt 


ggt 


teg 


ttc 


ggt 


att 


ttt 


aat 


6520 


ggc 


gat 


gtt 


tta 


ggg 


eta 


tea 


gtt 


cgc 


gca 


tta 


aag 


act 


aat 


age 


cat 


6568 


tea 


aaa 


ata 


ttg 


tct 


gtg 


cca 


cgt 


att 


ctt 


acg 


ctt 


tea 


ggt 


cag 


aag 


6616 


ggt 


tct 


ate 


tct 


gtT GGC 
MscI 


CAg 


aat 


gtc 


cct 


ttt 


att 


act 


ggt 


cgt 


gtg 


6664 


act 


ggt 


gaa 


tct 


gee aat 


gta 


aat 


aat 


cca 


ttt 


cag 


acg 


att 


gag 


cgt 


6712 


caa 


aat 


gta 


ggt 


att 


tec 


atg 


age 


gtt 


ttt 


cct 


gtt 


gca 


atg 


get 


ggc 


6760 


ggt 


aat 


att 


gtt 


ctg gat 


att 


ace 


age 


aag 


gee 


gat 


agt 


ttg 


agt 


tct 


6808 


tct 


act 


cag 


gca 


agt 


gat 


gtt 


att 


act 


aat 


caa 


aga 


agt 


att 


get 


aca 


6856 


acg 


gtt 


aat 


ttg 


cgt 


gat 


gga 


cag 


act 


ctt 


tta 


etc 


ggt 


ggc 


etc 


act 


6904 


gat 


tat 


aaa 


aac 


act 


tct 


caa 


gat 


tct 


ggc 


gta 


ccg 


ttc 


ctg 


tct 


aaa 


6952 


ate 


cct 


tta 


ate 


ggc 


etc 


ctg 


ttt 


age 


tec 


cgc 


tct 


gat 


tec 


aac 


gag 


7000 


gaa 


age 


acg 


tta 


tac gtg 


etc 


gtc 


aaa 


gca 


ace 


ata 


gta 


cgc 


gee 


ctg 


7048 


TAG 


eggegcatt 




























End 


IV 































7 060 aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca gcgccctagc 
7120 gcccgctcct ttegctttet tcccttcctt tctcgccacg ttcGCCGGCt ttccccgtca 

NgoMI_ 

7180 agctctaaat cgggggctcc ctttagggtt ccgatttagt getttaegge acctcgaccc 
7240 caaaaaactt gatttgggtg atggttCACG TAGTGggcca tcgccctgat agacggtttt 

Dralll 

7300 tcgccctttG ACGTTGGAGT Ccacgttctt taatagtgga ctcttgttcc aaactggaac 
DrdI 

7360 aacactcaac cctatctcgg gctattcttt tgatttataa gggattttgc egatttegga 
7420 accaccatca aacaggattt tcgcctgctg gggcaaacca gcgtggaccg ettgetgeaa 
7480 ctctctcagg gecaggeggt gaagggcaat CAGCTGttgc cCGTCTCact ggtgaaaaga 

PvuII. BsmBI. 

754 0 aaaaccaccc tGGATCC AAGCTT 

BamHI Hindi! I (1/2) 
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SI 

5. 

I 

m 
o 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Insert carrying bla gene 
gcaggtg gcacttttcg gggaaatgtg cgcggaaccc 
ctatttgttt atttttctaa atacattcaa atatGTATCC 

BciVI 

gataaatgct tcaataatat tgaaaaAGGA AGAgt 

RBS . ? . . . 

Start bla gene 

ATG agt att caa cat ttc cgt gtc gcc ctt att 
tgc ctt cct gtt ttt get cac cca gaa acg ctg 
gaa gat cag ttg ggC gCA CGA Gtg ggt tac ate 

BssSI . . . 
ApaLI removed 
ggt aag ate ctt gag agt ttt cgc ccc gaa gaa 
act ttt aaa gtt ctg eta tgt cat aca eta tta 
caa gaG CAA CTC GGT CGc egg gcg egg tat tct 

Bcgl 

8001 TAC Tea cca gtc aca gaa aag cat ctt acg gat 
Scal_ 

tta tgc agt get gcc ata ace atg agt gat aac 
ctg aca aCG ATC Gga gga ccg aag gag eta acc 
Pvul 

ggg gat cat gta act cgc ctt gat cgt tgg gaa 
ata cca aac gac gag cgt gac acc acg atg cct 
tTG CGC Aaa eta tta act ggc gaa eta ctt act 
Fspl .... 



7563 
7600 

7660 



7695 
7746 
7797 



7848 
7899 
7950 



8052 
8103 

8154 
8205 
8256 



gctcatgaga caataaccct 



ccc ttt ttt gcg gca ttt 
gtg aaa gta aaa gat get 
gaa ctg gat etc aac age 



cgt ttt cca atg atg age 
tec cgt att gac gcc ggg 
cag aat gac ttg gtt gAG 

Seal 

ggc atg aca gta aga gaa 

act gcg gcc aac tta ctt 
get ttt ttg cac aac atg 

ccg gag ctg aat gaa gcc 
gta gca atg cca aca acg 
eta get tec egg caa caa 



8307 
8358 


tta ata 
GCC ctt 
Bgll 


gac 
ccG 


tgg 
GCt 


atg 
ggc 


gag gcg 
tgg ttt 


gat 
att 


aaa 
get 


gtt 
gat 


gca 
aaa 


gga 
tct 


cca 
gga 


ctt 
gcc 


ctg 
ggt 


cgc 
gag 


teg 
cgt 


8409 


gGG TCT 

Bsal 
ate gta 


Cgc 


ggt 


ate 


att gca 


gca 


ctg 


ggg 


cca 


gat 


ggt 


aag 


ccc 


tec 


cgt 


8460 


gtt 


ate 


tac 


acG ACg 
Ahdl 


ggg 


aGT 


Cag 


gca 


act 


atg 


gat 


gaa 


cga 


aat 


8511 


aga cag 


ate 


get 


gag 


ata ggt 


gcc 


tea 


ctg 


att 


aag 


cat 


tgg TAA 


ctgt 



8560 
8620 
8680 
8704 



8740 

8790 

8797 
8832 
8892 
8952 
9012 



gtgegatett 



stop 

cagaccaagt ttactcatat atactttaga ttgatttaaa acttcatttt taatttaaaa 
ggatctaggt gaagatcctt tttgataatc tcatgaccaa aatcccttaa cgtgagtttt 
cgttccactg taegtaagae cccc 

AAGCTT GTCGAC tgaa tggcgaatgg cgctttgcct 
Hindi I I Sail.. 
(2/2) Hindi 

ggtttccggc accagaagcg gtgccggaaa gctggctgga 

CCTGAGG 
Bsu36I_ 

ccgat actgtegteg tcccctcaaa ctggcagatg 

cacggttacg atgcgcccat ctacaccaac gtaacctatc 

tttgttccca eggagaatec gacgggttgt tactcgctca 

tggctacagg aaggecagae gcgaattatt tttgatggcg 

agctgattta acaaaaattt aacgegaatt ttaacaaaat 



ecattaeggt caatccgccg 
catttaatgt tgatgaaagc 
ttcctattgg ttaaaaaatg 
attaacgttt acaATTTAAA 
Swal . . . 

Tatttgctta tacaatcttc ctgtttttgg ggcttttctg attatcaacc GGGGTAcat 

RBS? 

ate gat tct ctt gtt tgc 



9072 
9131 
9182 

9233 get acc etc tec ggc atg aat tta tea get aga 



ATG att gac atg eta gtt tta cga tta ccg ttc 
Start gene II 

tec aga etc tea ggc aat gac ctg ata gcc ttt 



gtA GAT CTc tea aaa ata 

Bglll. . . 
acg gtt gaa tat cat att 
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9284 gat ggt gat ttg act 
9335 aca cat tac tea ggc 
9386 tat cct tgc gtt gaa 
9437 aat gtt ttt ggt aca 
9488 aat ttt get aat tct 
gene II continues 



gtc tec ggc ctt tct cac 
att gca ttt aaa ata tat 
ata aag get tct ccc gca 
ace gat tta get tta tgc 
ttg cct tgc ctg tat gat 



cct ttt gaa tct tta cct 
gag ggt tct aaa aat ttt 
aaa gta tta cag ggt cat 
tct gag get tta ttg ctt 
tta ttg gat gtt ! 9532 



- 206 - 



Table 2 IB: Sequence of MALI A3, condensed 

LOCUS MALI A3 9532 CIRCULAR 

ORIGIN 



1 AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 

5 61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 

121 CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 

181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 

241 TCCGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 

361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 

10 421 CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 

481 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 

541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 

601 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 

661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 

15 721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 

781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 

841 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 

901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 

j*. 961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 

y 20 1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 

O 1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 

1141 CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 

If 1 ! 1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 

-LI 1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 

W"J 25 1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 

SI 1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 

JE 1441 TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 

/ 1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 

j\ 1561 TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 

H : 30 1621 TATTCTCACA GTGCACAGTC TGTCGTGACG CAGCCGCCCT CAGTGTCTGG GGCCCCAGGG 

(J3 1681 CAGAGGGTCA CCATCTCCTG CACTGGGAGC AGCTCCAACA TCGGGGCAGG TTATGATGTA 

Rj 1741 CACTGGTACC AGCAGCTTCC AGGAACAGCC CCCAAACTCC T CAT CTAT GG TAACAGCAAT 

iJJ 1801 CGGCCCTCAG GGGTCCCTGA CCGATTCTCT GGCTCCAAGT CTGGCACCTC AGCCTCCCTG 

1861 GCCATCACTG GGCTCCAGGC TGAGGATGAG GCTGATTATT ACTGCCAGTC CTATGACAGC 

P 35 1921 AGCCTGAGTG GCCTTTATGT CTTCGGAACT GGGACCAAGG TCACCGTCCT AGGTCAGCCC 

M. 1981 AAGGCCAACC CCACTGTCAC TCTGTTCCCG CCCTCCTCTG AGGAGCTCCA AGCCAACAAG 

2041 GCCACACTAG TGTGTCTGAT CAGTGACTTC TACCCGGGAG CTGTGACAGT GGCCTGGAAG 

2101 GCAGATAGCA GCCCCGTCAA GGCGGGAGTG GAGACCACCA CACCCTCCAA ACAAAGCAAC 

2161 AACAAGTACG CGGCCAGCAG CTATCTGAGC CTGACGCCTG AGCAGTGGAA GTCCCACAGA 

4 0 2221 AGCTACAGCT GCCAGGTCAC GCATGAAGGG AGCACCGTGG AGAAGACAGT GGCCCCTACA 

2281 GAATGTTCAT AATAAACCGC CTCCACCGGG CGCGCCAATT CTATTTCAAG GAGACAGTCA 

2341 TAATGAAATA CCTATTGCCT ACGGCAGCCG CTGGATTGTT ATTACTCGCG GCCCAGCCGG 

2401 CCATGGCCGA AGTTCAATTG TTAGAGTCTG GTGGCGGTCT TGTTCAGCCT GGTGGTTCTT 

2461 TACGTCTTTC TTGCGCTGCT TCCGGATTCA CTTTCTCTTC GTACGCTATG TCTTGGGTTC 

45 2521 GCCAAGCTCC TGGTAAAGGT TTGGAGTGGG TTTCTGCTAT CTCTGGTTCT GGTGGCAGTA 

2581 CTTACTATGC TGACTCCGTT AAAGGTCGCT TCACTATCTC TAGAGACAAC TCTAAGAATA 

2641 CTCTCTACTT GCAGATGAAC AGCTTAAGGG CTGAGGACAC TGCAGTCTAC TATTGCGCTA 

2701 AAGACTATGA AGGTACTGGT TATGCTTTCG ACATATGGGG TCAAGGTACT ATGGTCACCG 

2761 TCTCTAGTGC CTCCACCAAG GGCCCATCGG TCTTCCCCCT GGCACCCTCC TCCAAGAGCA 

50 2821 CCTCTGGGGG CACAGCGGCC CTGGGCTGCC TGGTCAAGGA CTACTTCCCC GAACCGGTGA 

2881 CGGTGTCGTG GAACTCAGGC GCCCTGACCA GCGGCGTCCA CACCTTCCCG GCTGTCCTAC 

2941 AGTCTAGCGG ACTCTACTCC CTCAGCAGCG TAGTGACCGT GCCCTCTTCT AGCTTGGGCA 

3001 CCCAGACCTA CATCTGCAAC GTGAATCACA AGCCCAGCAA CACCAAGGTG GACAAGAAAG 

3061 TTGAGCCCAA ATCTTGTGCG GCCGCTCATC ACCACCATCA TCACTCTGCT GAACAAAAAC 

55 3121 TCATCTCAGA AGAGGATCTG AATGGTGCCG CAGATATCAA CGATGATCGT ATGGCTGGCG 

3181 CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA TTTACTAACG 

3241 TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT CTGTGGAATG 
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3301 
3361 
3421 
3481 
3541 
3601 
3661 
3721 
3781 
3841 
3901 
3961 
4021 
4081 
4141 
4201 
4261 
4321 
4381 
4441 
4501 
4561 
4621 
4681 
4741 
4801 
4861 
4921 
4981 
5041 
5101 
5161 
5221 
5281 
5341 
5401 
5461 
5521 
5581 
5641 
5701 
5761 
5821 
5881 
5941 
6001 
6061 
6121 
6181 
6241 
6301 
6361 
6421 
6481 
6541 
6601 
6661 
6721 



CTACAGGCGT 

TTGGGCTTGC 

GCGGTTCTGA 

ATACTTATAT 

ATCCTAATCC 

GGTTCCGAAA 

ACCCCGTTAA 

ACTGGAACGG 

TTTGTGAATA 

GCTCTGGTGG 

AGGGTGGCGG 

AT GAAAAGAT 

TACAGTCTGA 

ATGGTTTCAT 

CTGGCTCTAA 

ATTTCCGTCA 

GCGCTGGTAA 

TCTTTGCGTT 

TACTGCGTAA 

TTTCCTCGGT 

CTTCGGTAAG 

AATTCTTGTG 

TGTTCAGTTA 

GGCTGCTATT 

AT AAT AT GGC 

TTGGTAAGAT 

GGCTTCAAAA 

CGGATAAGCC 

AAAATAAAAA 

GGAATGATAA 

GGGATATTAT 

TAGCTGAACA 

CTTTATATTC 

TTAAATATGG 

ATTTGTATAA 

ATTCTTATTT 

AGAAGATGAA 

TTGGATTTGC 

AGGTAGTCTC 

ATCTAAGCTA 

TACAGAAGCA 

GTAATTCAAA 

TCTTCTTTTG 

TATTCAAAGC 

GT AT ATT CAT 

GCTAATAATT 

AATCAGGATT 

GCTCCTTCTG 

AATAACGTTC 

TCTAAATCCT 

AAAGATATTT 

ATATTGATTG 

GCTGCTGGCT 

GTTTTATCTT 

GTTCGCGCAT 

CTTTCAGGTC 

GTGACTGGTG 

GGTATTTCCA 



TGTAGTTTGT 

TATCCCTGAA 

GGGTGGCGGT 

CAACCCTCTC 

TTCTCTTGAG 

TAGGCAGGGG 

AACTTATTAC 

TAAATTCAGA 

TCAAGGCCAA 

TGGTTCTGGT 

CTCTGAGGGA 

GGCAAACGCT 

CGCTAAAGGC 

TGGTGACGTT 

TTCCCAAATG 

ATATTTACCT 

ACCATATGAA 

TCTTTTATAT 

TAAGGAGTCT 

TTCCTTCTGG 

ATAGCTATTG 

GGTTATCTCT 

ATTCTCCCGT 

TTCATTTTTG 

TGTTTATTTT 

TCAGGATAAA 

CCTCCCGCAA 

TTCTATATCT 

CGGCTTGCTT 

GGAAAGACAG 

TTTTCTTGTT 

TGTTGTTTAT 

TCTTATTACT 

CGATTCTCAA 

CGCATATGAT 

AACGCCTTAT 

ATTAACTAAA 

ATCAGCATTT 

TCAGACCTAT 

TCGCTATGTT 

AGGTTATTCA 

TGAAATTGTT 

CTCAGGTAAT 

AATCAGGCGA 

CTGACGTTAA 

TTGATATGGT 

AT ATT GAT GA 

GTGGTTTCTT 

GGGCAAAGGA 

CAAATGTATT 

TAGATAACCT 

AGGGTTTGAT 

CTCAGCGTGG 

CTGCTGGTGG 

TAAAGACTAA 

AGAAGGGTTC 

AATCTGCCAA 

TGAGCGTTTT 



ACTGGTGACG 

AATGAGGGTG 

ACTAAACCTC 

GACGGCACTT 

GAGTCTCAGC 

GCATTAACTG 

CAGTACACTC 

GACTGCGCTT 

TCGTCTGACC 

GGCGGCTCTG 

GGCGGTTCCG 

AATAAGGGGG 

AAACTTGATT 

TCCGGCCTTG 

GCTCAAGTCG 

TCCCTCCCTC 

TTTTCTATTG 

GTTGCCACCT 

TAATCATGCC 

TAACTTTGTT 

CTATTTCATT 

CT GAT AT TAG 

CTAATGCGCT 

ACGTTAAACA 

GTAACTGGCA 

ATTGTAGCTG 

GTCGGGAGGT 

GATTTGCTTG 

GTTCTCGATG 

CCGATTATTG 

CAGGACTTAT 

TGTCGTCGTC 

GGCTCGAAAA 

TTAAGCCCTA 

ACTAAACAGG 

TTATCACACG 

ATATATTTGA 

ACATATAGTT 

GATTTTGATA 

TTCAAGGATT 

CTCACATATA 

AAATGTAATT 

TGAAATGAAT 

ATCCGTTATT 

ACCTGAAAAT 

TGGTTCAATT 

ATTGCCATCA 

TGTTCCGCAA 

TTTAATACGA 

ATCTATTGAC 

TCCTCAATTC 

ATTTGAGGTT 

CACTGTTGCA 

TTCGTTCGGT 

TAGCCATTCA 

TATCTCTGTT 

TGTAAATAAT 

TCCTGTTGCA 



AAACTCAGTG 

GTGGCTCTGA 

CTGAGTACGG 

ATCCGCCTGG 

CTCTTAATAC 

TTTATACGGG 

CTGTATCATC 

TCCATTCTGG 

TGCCTCAACC 

AGGGTGGTGG 

GTGGTGGCTC 

CTATGACCGA 

CTGTCGCTAC 

CTAATGGTAA 

GTGACGGTGA 

AATCGGTTGA 

ATTGTGACAA 

TTATGTATGT 

AGTTCTTTTG 

CGGCTATCTG 

GTTTCTTGCT 

CGCTCAATTA 

TCCCTGTTTT 

AAAAATCGTT 

AATTAGGCTC 

GGTGCAAAAT 

TCGCTAAAAC 

CTATTGGGCG 

AGTGCGGTAC 

ATTGGTTTCT 

CTATTGTTGA 

TGGACAGAAT 

TGCCTCTGCC 

CTGTTGAGCG 

CTTTTTCTAG 

GTCGGTATTT 

AAAAGTTTTC 

ATATAACCCA 

AATTCACTAT 

CTAAGGGAAA 

TTGATTTATG 

AATTTTGTTT 

AATTCGCCTC 

GTTTCTCCCG 

CTACGCAATT 

CCTTCCATAA 

TCTGATAATC 

AAT GAT AAT G 

GTTGTCGAAT 

GGCTCTAATC 

CTTTCTACTG 

CAGCAAGGTG 

GGCGGTGTTA 

ATTTTTAATG 

AAAATATTGT 

GGCCAGAATG 

CCATTTCAGA 

ATGGCTGGCG 



TTACGGTACA 

GGGTGGCGGT 

TGATACACCT 

TACTGAGCAA 

TTTCATGTTT 

CACTGTTACT 

AAAAGCCATG 

CTTTAATGAA 

TCCTGTCAAT 

CTCTGAGGGT 

TGGTTCCGGT 

AAATGCCGAT 

TGATTACGGT 

TGGTGCTACT 

TAATTCACCT 

ATGTCGCCCT 

AATAAACTTA 

ATTTTCTACG 

GGTATTCCGT 

CTTACTTTTC 

CTTATTATTG 

CCCTCTGACT 

TATGTTATTC 

TCTTATTTGG 

TGGAAAGACG 

AGCAACTAAT 

GCCTCGCGTT 

CGGTAATGAT 

TTGGTTTAAT 

ACATGCTCGT 

TAAACAGGCG 

TACTTTACCT 

TAAATTACAT 

TTGGCTTTAT 

TAATTATGAT 

CAAACCATTA 

TCGCGTTCTT 

ACCTAAGCCG 

TGACTCTTCT 

ATTAAT TAAT 

TACTGTTTCC 

TCTTGATGTT 

TGCGCGATTT 

ATGTAAAAGG 

TCTTTATTTC 

TTCAGAAGTA 

AGGAATAT GA 

TTACTCAAAC 

TGTTTGTAAA 

TATTAGTTGT 

TTGATTTGCC 

ATGCTTTAGA 

ATACTGACCG 

GCGATGTTTT 

CTGTGCCACG 

TCCCTTTTAT 

CGATTGAGCG 

GTAATATTGT 



TGGGTTCCTA 

TCTGAGGGTG 

ATTCCGGGCT 

AACCCCGCTA 

CAGAATAATA 

CAAGGCACTG 

TATGACGCTT 

GATCCATTCG 

GCTGGCGGCG 

GGCGGTTCTG 

GATTTTGATT 

GAAAACGCGC 

GCTGCTATCG 

GGTGATTTTG 

TTAATGAATA 

TTTGTCTTTA 

TTCCGTGGTG 

TTTGCTAACA 

TATTATTGCG 

TTAAAAAGGG 

GGCTTAACTC 

TTGTTCAGGG 

TCTCTGTAAA 

ATTGGGATAA 

CTCGTTAGCG 

CTTGATTTAA 

CTTAGAATAC 

TCCTACGATG 

ACCCGTTCTT 

AAATTAGGAT 

CGTTCTGCAT 

TTTGTCGGTA 

GTTGGCGTTG 

ACTGGTAAGA 

TCCGGTGTTT 

AATTTAGGTC 

TGTCTTGCGA 

GAGGTTAAAA 

CAGCGTCTTA 

AGCGACGATT 

ATTAAAAAAG 

TGTTTCATCA 

TGTAACTTGG 

TACTGTTACT 

TGTTTTACGT 

TAATCCAAAC 

TGATAATTCC 

TTTTAAAATT 

GTCTAATACT 

TTCTGCACCT 

AACTGACCAG 

TTTTTCATTT 

CCTCACCTCT 

AGGGCTATCA 

TATTCTTACG 

TACTGGTCGT 

TCAAAATGTA 

TCTGGATATT 
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6781 
6841 
6901 
6961 
7021 
7081 
7141 
7201 
7261 
7321 
7381 
7441 
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7561 
7621 
7681 
7741 
7801 
7861 
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8041 
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8161 
8221 
8281 
8341 
8401 
8461 
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9241 
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9421 
9481 



ACCAGCAAGG 
AGAAGTATTG 
ACTGATTATA 
ATCGGCCTCC 
GTCAAAGCAA 
TACGCGCAGC 
CCCTTCCTTT 
TTTAGGGTTC 
TGGTTCACGT 
CACGTTCTTT 
CTATTCTTTT 
CGCCTGCTGG 
AAGGGCAATC 
TTGCAGGTGG 
TACATTCAAA 
GAAAAAGGAA 
CATTTTGCCT 
ATCAGTTGGG 
AGAGTTTTCG 
ATACACTATT 
CTCAGAATGA 
CAGTAAGAGA 
TTCTGACAAC 
ATGTAACTCG 
GTGACACCAC 
TACTTACTCT 
GACCACTTCT 
GTGAGCGTGG 
TCGTAGTTAT 
CTGAGATAGG 
TACTTTAGAT 
TTGATAATCT 
CCCAAGCTTG 
TGCCGGAAAG 
ACTGGCAGAT 
TCAATCCGCC 
TTGATGAAAG 
GTTAAAAAAT 
TACAATTTAA 
CGGGGTACAT 
CTCCAGACTC 
CTCCGGCATG 
CTCCGGCCTT 
AATATATGAG 
AGTATTACAG 
ATTGCTTAAT 



CCGATAGTTT 
CTACAACGGT 
AAAACACTTC 
TGTTTAGCTC 
CCATAGTACG 
GTGACCGCTA 
CTCGCCACGT 
CGATTTAGTG 
AGTGGGCCAT 
AATAGTGGAC 
GATTTATAAG 
GGCAAACCAG 
AGCTGTTGCC 
CACTTTTCGG 
TATGTATCCG 
GAGTATGAGT 
TCCTGTTTTT 
CGCACGAGTG 
CCCCGAAGAA 
ATCCCGTATT 
CTTGGTTGAG 
ATTATGCAGT 
GATCGGAGGA 
CCTTGATCGT 
GATGCCTGTA 
AGCTTCCCGG 
GCGCTCGGCC 
GTCTCGCGGT 
CTACACGACG 
TGCCTCACTG 
TGATTTAAAA 
CATGACCAAA 
TCGACTGAAT 
CTGGCTGGAG 
GCACGGTTAC 
GTTTGTTCCC 
CTGGCTACAG 
GAGCTGATTT 
ATATTTGCTT 
ATGATTGACA 
TCAGGCAATG 
AATTTATCAG 
TCTCACCCTT 
GGTTCTAAAA 
GGTCATAATG 
TTTGCTAATT 



GAGTTCTTCT 
TAATTTGCGT 
TCAAGATTCT 
CCGCTCTGAT 
CGCCCTGTAG 
CACTTGCCAG 
TCGCCGGCTT 
CTTTACGGCA 
CGCCCTGATA 
TCTTGTTCCA 
GGATTTTGCC 
CGTGGACCGC 
CGTCTCACTG 
GGAAATGTGC 
CTCATGAGAC 
ATTCAACATT 
GCTCACCCAG 
GGTTACATCG 
CGTTTTCCAA 
GACGCCGGGC 
TACTCACCAG 
GCTGCCATAA 
CCGAAGGAGC 
TGGGAACCGG 
GCAATGCCAA 
CAACAATTAA 
CTTCCGGCTG 
ATCATTGCAG 
GGGAGTCAGG 
ATTAAGCATT 
CTTCATTTTT 
ATCCCTTAAC 
GGCGAATGGC 
TGCGATCTTC 
GATGCGCCCA 
ACGGAGAATC 
GAAGGCCAGA 
AACAAAAATT 
ATACAATCTT 
TGCTAGTTTT 
ACCTGATAGC 
CTAGAACGGT 
TTGAATCTTT 
ATTTTTATCC 
TTTTTGGTAC 
CTTTGCCTTG 



ACTCAGGCAA 
GATGGACAGA 
GGCGTACCGT 
TCCAACGAGG 
CGGCGCATTA 
CGCCCTAGCG 
TCCCCGTCAA 
CCTCGACCCC 
GACGGTTTTT 
AACTGGAACA 
GATTTCGGAA 
TTGCTGCAAC 
GTGAAAAGAA 
GCGGAACCCC 
AATAACCCTG 
TCCGTGTCGC 
AAACGCTGGT 
AACTGGATCT 
TGATGAGCAC 
AAGAGCAACT 
TCACAGAAAA 
CCATGAGTGA 
TAACCGCTTT 
AGCTGAATGA 
CAACGTTGCG 
TAGACTGGAT 
GCTGGTTTAT 
CACTGGGGCC 
CAACTATGGA 
GGTAACTGTC 
AATTTAAAAG 
GTGAGTTTTC 
GCTTTGCCTG 
CTGAGGCCGA 
TCTACACCAA 
CGACGGGTTG 
CGCGAATTAT 
TAACGCGAAT 
CCTGTTTTTG 
ACGATTACCG 
CTTTGTAGAT 
TGAATATCAT 
ACCTACACAT 
TTGCGTTGAA 
AACCGATTTA 
CCTGTATGAT 



GTGATGTTAT 
CTCTTTTACT 
TCCTGTCTAA 
AAAGCACGTT 
AGCGCGGCGG 
CCCGCTCCTT 
GCTCTAAATC 
AAAAAACTTG 
CGCCCTTTGA 
ACACTCAACC 
CCACCATCAA 
TCTCTCAGGG 
AAACCACCCT 
TATTTGTTTA 
ATAAATGCTT 
CCTTATTCCC 
GAAAGTAAAA 
CAACAGCGGT 
TTTTAAAGTT 
CGGTCGCCGG 
GCATCTTACG 
TAACACTGCG 
TTTGCACAAC 
AGCCATACCA 
CAAACTATTA 
GGAGGCGGAT 
TGCTGATAAA 
AGATGGTAAG 
TGAACGAAAT 
AGACCAAGTT 
GATCTAGGTG 
GTTCCACTGT 
GTTTCCGGCA 
TACTGTCGTC 
CGTAACCTAT 
TTACTCGCTC 
TTTTGATGGC 
TTTAACAAAA 
GGGCTTTTCT 
TTCATCGATT 
CTCTCAAAAA 
ATTGATGGTG 
TACTCAGGCA 
ATAAAGGCTT 
GCTTTATGCT 
TTATTGGATG 



TACTAATCAA 

CGGTGGCCTC 

AATCCCTTTA 

ATACGTGCTC 

GTGTGGTGGT 

TCGCTTTCTT 

GGGGGCTCCC 

ATTTGGGTGA 

CGTTGGAGTC 

CTATCTCGGG 

ACAGGATTTT 

CCAGGCGGTG 

GGATCCAAGC 

TTTTTCTAAA 

CAATAATATT 

TTTTTTGCGG 

GATGCTGAAG 

AAGATCCTTG 

CTGCTATGTC 

GCGCGGTATT 

GATGGCATGA 

GCCAACTTAC 

ATGGGGGATC 

AACGACGAGC 

ACTGGCGAAC 

AAAGTTGCAG 

TCTGGAGCCG 

CCCTCCCGTA 

AGACAGATCG 

TACTCATATA 

AAGATCCTTT 

ACGTAAGACC 

CCAGAAGCGG 

GTCCCCTCAA 

CCCATTACGG 

ACATTTAATG 

GTTCCTATTG 

TATTAACGTT 

GATTATCAAC 

CTCTTGTTTG 

TAGCTACCCT 

ATTTGACTGT 

TTGCATTTAA 

CTCCCGCAAA 

CTGAGGCTTT 

TT 
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Table 25: h3401-h2 captured Via CJ with BsmAI 
! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
ISAQDIQMTQSPATLS 
aGT GCA C aa gac ate cag atg acc cag tct cca gec acc ctg tct 
5 ! ApaLI... a gec acc ! L25,L6,L20,L2,L16,A11 
! Extender. Bridge... 

! 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
IVSPGERATLSCRASQ 
10 gtg tct cca ggg gaa agg gec acc etc tec tgc agg gec agt cag 

! 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
ISVSNNLAWYQQKPGQ 
agt gtt agt aac aac tta gec tgg tac cag cag aaa cct ggc cag 



15 



! 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
IVPRLLIYGASTRATD 
gtt ccc agg etc etc ate tat ggt gca tec acc agg gec act gat 



Q 20 ! 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 

*p IIPARFSGSGSGTDFT 

ate cca gee agg ttc agt ggc agt ggg tct ggg aca gac ttc act 



a) 

£ 25 ! L T I S R L E P E D F A V Y Y 



! 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
LTISRLEPEDFAVYY 
etc acc ate age aga ctg gag cct gaa gat ttt gca gtg tat tac 



M ! 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 

Q ICQRYGSSPGWTFGQG 
Hi 30 tgt cag egg tat ggt age tea ccg ggg tgg acg ttc ggc caa ggg 

1: ! 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 

J 2 *' ITKVEIKRTVAAPSVF 
^ v acc aag gtg gaa ate aaa cga act gtg get gca cca tct gtc ttc 

35 

! 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
IIFPPSDEQLKSGTAS 
ate ttc ccg cca tct gat gag cag ttg aaa tct gga act gee tct 

4 0 ! 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 
IVVCLLNNFYPREAKV 
gtt gtg tgc ctg ctg aat aac ttc tat ccc aga gag gee aaa gta 

! 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 
45 IQWKVDNALQSGNSQE 

cag tgg aag gtg gat aac gee etc caa teg ggt aac tec cag gag 

! 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 
ISVTEQDSKDSTYSLS 
50 agt gtc aca gag cag gac age aag gac age acc tac age etc age 



! 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 



ISTLTLSKADYEKHKV 
age acc ctg acg ctg age aaa gca gac tac gag aaa cac aaa gtc 

! 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 
IYACEVTHQGLSSPVT 
tac gee tgc gaa gtc acc cat cag ggc ctg age teg cct gtc aca 

! 211 212 213 214 215 216 217 218 219 220 221 222 223 
IKSFNKGECKGEFA 
aag age ttc aac aaa gga gag tgt aag ggc gaa ttc gc 
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Table 26: h3401-d8 KAPPA captured with CJ and BsmM 

! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
ISAQDIQMTQSPATLS 
5 aGTGCACaa gac ate cag atg acc cag tct cct gec acc ctg tct 

! ApaLI... Extender g gec acc ! L25,L6,L20,L2,L16,A11 

! A GCC ACC CTG TCT ! L2 

! 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
10 IVSPGERATLSCRASQ 

gtg tct cca ggt gaa aga gec acc etc tec tgc agg gec agt cag 
! GTG TCT CCA GGG GAA AGA GCC ACC CTC TCC TGC ! L2 

! 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
15 INLLSNLAWYQQKPGQ 
aat ctt etc age aac tta gec tgg tac cag cag aaa cct ggc cag 

U ! 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 

f*\ iAPRLLIYGASTGAIG 
p 2 0 get ccc agg etc etc ate tat ggt get tec acc ggg gec att ggt 

rl ! 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 

^ IIPARFSGSGSGTEFT 

^ ; ate cca gee agg ttc agt ggc agt ggg tct ggg aca gag ttc act 

SI 25 

4* ! 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 

2 ILTISSLQSEDFAVYF 
y. etc acc ate age age ctg cag tct gaa gat ttt gca gtg tat ttc 

Pi 

JK 30 ! 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
j* ICQQYGTSPPTFGGGT 
^ i tgt cag cag tat ggt acc tea ccg ccc act ttc ggc gga ggg acc 

n 

,U ! 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 

35 IKVEIKRTVAAPSVFI 

aag gtg gag ate aaa cga act gtg get gca cca tct gtc ttc ate 

! 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
1FPPSDEQLKSGTASV 
4 0 ttc ccg cca tct gat gag cag ttg aaa tct gga act gee tct gtt 

! 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 
IVCPLNNFYPREAKVQ 
gtg tgc ccg ctg aat aac ttc tat ccc aga gag gec aaa gta cag 

45 

! 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 
IWKVDNALQSGNSQES 
tgg aag gtg gat aac gec etc caa teg ggt aac tec cag gag agt 

50 ! 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 
IVTEQDNKDSTYSLSS 
gtc aca gag cag gac aac aag gac age acc tac age etc age age 



! 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 
ITLTLSKVDYEKHEVY 
acc ctg acg ctg age aaa gta gac tac gag aaa cac gaa gtc tac 

! 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 
IACEVTHQGLSSPVTK 
gec tgc gaa gtc acc cat cag ggc ctt age teg ccc gtc acg aag 

! 211 212 213 214 215 216 217 218 219 220 221 222 223 
!SFNRGECKKEFV 
age ttc aac agg gga gag tgt aag aaa gaa ttc gtt t 
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Table 27: V3-23 VH framework with variegated codons shown 



m 
m 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



17 18 19 20 21 22 
A Q P A M A 
5- ctg tct gaa cG GCC cag ccG GCC atg gcc 29 

3'-gac aga ctt gc egg gtc ggc egg tac egg 

Scab Sfil 

NgoML. 
NcoL.. 



FRl(DP47/V3-23) 

23 24 25 26 27 28 29 30 
EVQLLESG 
gaa|gtt|CAA|TTG|tta|gag|tct|ggt| 

ctt|caa|gtt|aac|aat|ctc|aga|cca| 
| Mfel | 



53 



-FR1- 



31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
GGLVQPGGSLRLSCA 
|ggc|ggtlctt|gtt icag|cct|ggt|ggtltctlttal cgt|ctt|tctltgc|gct| 98 
|ccg|cca|gaa|caa|gtc|gga|cca|cca|aga|aat|gca|gaa|aga|acg|cga| 



Sites to be varied— > 



-FRi- 



->|...CDR1 |— FR2- 



46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
ASGFTFSSYAMSWVR 
IgctlTCCIGGAIttclactlttcl tctltCGlTACIGctlatgltctl tgglgttlcgCI 
|cga|agg|cct|aag|tga|aag|aga|agc|atg|cga|tac|aga|acc|caa|gcg| 
| BspEI | | BsiWI| |BstXL 



143 



Sites to be varies— > *** *** *** 



-FR2- 



>|...CDR2 

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
QAPGKGLEWVSAISG 
ICAa|gct|ccTIGG t|aaa| ggtlttglgag|tgg|gtt|tct| gctlatc|tct|ggt| 
|gtt|cga|gga|cca|ttt|cca|aac|ctc|acc|caa|aga|cga|tag|aga|cca| 
.BstXI | 



188 



CDR2 1— FR3— 

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
SGGSTYYADSVKGRF 
ltct|ggt(ggclagt|actltac|ta t|gct|gac|tcc|gtt|aaa|ggt |cgclttc| 233 
|aga|cca|ccg|tca|tga|atg|ata|cga|ctg|agg|caa|ttt|cca|gcg|aag| 



-FR3- 



91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
TISRDNSKNTLYLQM 
|act|atc|TCT|AGA|gac|aac|tct|aag|aat|act|ctc|tac|ttg|cag|atg| 278 
|tga|tag|aga|tct|ctg|ttg|aga|ttc|tta|tga|gag|atg|aac|gtc|tac| 
|XbaI | 



-FR3- 



->l 



106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 

NSLRAEDTAVYYCAK 
laacla gClTTAIAGg|gct|gag|gac|aCTl GCA|Gtcltac|tatltgc|gct|aaal 323 
|ttg|tcg|aat|tcc|cga|ctc|ctg|tga|cgt|cag|atg|ata|acg|cga|ttt| 
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|Afin| 



|PstI| 



H 

c 

m 



Si 

y 

C 

ru 
m 

o 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



..CDR3 1— FR4- 



121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
DYEGTGYAFDIWGQG 
|gacltatlgaa|ggtlact|ggtltat| gctlttc|gaC|ATA|TGg|ggt|ca a|ggt| 368 
|ctg|ata|ctt|cca|tga|cca|ata|cga|aag|ctg|tat|acc|cca|gtt|cca| 
| Ndel | 



-FR4- 



->l 



136 137 138 139 140 141 142 
T M V T V S S 
|act|atG|GTC|ACC|gtc|tct|agt- 
|tga|tac|cag|tgg|cag|aga|tca- 
I BstEH I 



389 



419 



143 144 145 146 147 148 149 150 151 152 
ASTKGPSVFP 
gcc tec acc aaG GGC CCa teg GTC TTC ccc-3' 
egg agg t gg ttc ccg ggt age cag aag ggg -5' 

Bspl20I. BbsI...(2/2) 

Apal.... 



(SFPRMET) 5'-ctg tct gaa cG GCC cag ccG-3' 
(TOPFR1 A) 5'-ctg tct gaa cG GCC cag ccG GCC atg gec- 

gaa|gtt|CAA|TTG|tta|gag|tct|ggt|- 

|ggc|ggt|ctt|gtt|cag|cct|ggt|ggt|tct|tta-3' 
(BOTFR1B) 3 f -caa|gtc|gga|cca|cca|aga|aat|gca|gaa|aga|acg|cga|- 

|cga|agg|cct|aag|tga|aag-5' ! bottom strand 
(BOTFR2) ^-acclcaalgcgl- 

lgttlcgalggalccaltttlccalaaclctclacclcaalagal-5' ! bottom strand 
(BOTFR3) 3'- a|cga|ctg|agg|caa|ttt|cca|gcg|aag|- 

|tga|tag|aga|tct|ctg|ttg|aga|ttc|tta|tga|gag|atg|aac|gtc|tac|- 
|ttg|tcg|aat|tcc|cga|ctc|ctg|tga-5' 
(F06) 5'-gC|TTA|AGg|gct|gag|gac|aCT|GCA|Gtc|tac|tat|tgc|gct|aaa|- 

|gac|tat|gaa|ggt|act|ggt|tat|gct|ttc|gaC|ATA|TGg|ggt|c-3' 
(BOTFR4) 3'-cga|aag|ctg|tat|acc|cca|gtt|cca|- 

|tga|tac|cag|tgg|cag|aga|tca- 

egg a gg *gg ttc cc g 6g ( a 8 c ca g aa g ggg-5 f • bottom strand 
(BOTPRCPRIM) 3'-gg ttc ccg ggt age cag aag ggg-5' 

CDR1 diversity 

(ON-vgCl) 5'- |gctlTCClGGAIttc|act|ttc|tct|<l>|TAC|<l>|atgl<l>| - 

! CDR1 6859 

ltgg>gttlcgClCAa|gct|ccTlGG- 3' 

<1> stands for an equimolar mix of {ADEFGHDCLMNPQRSTVWY}; no C 
(this is not a sequence) 

CDR2 diversity 

(ON-vgC2) 5 , -ggt|ttg|gag|tgg|gtt|tct|<2>|atc|<2>|<3>|- 

! CDR2 

|tct|ggt|ggc|< 1 >|act|< 1 >|tat|gct|gac|tcc|gtt|aaa|gg-3' 

! CDR2 

! <1> is an equimolar mixture of {ADEFGHDCLMNPQRSTVWY}; no C 
! <2> is an equimolar mixture of {YRVWGS}; no ACDEFHDCLMNPQT 
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! <3> is an equimolar mixture of {PS}; no ACDEFGHKLMNQRTVWY 



M 
0 
O 

m 
m 
\ 

B 

u 

Q 
m 

m 



v < b P 

O O U 1? ^ 
H CJ < O U 

< < o F < 



fc ^ n < y n 

b < 2 5 o P 



u 
u 

H 

u 




u -J < o < 

^ U q u H O 

V ^ _ — I —I 



< o a u < < u 

< < o o o £ u 
O H h o u b 
H < P P a u o 

o a a u u p y 
p h < a < < 
P a < a h 5 < 

o < < u o a o 

< a < u a u a 



a a P 
h < u 
u u H 

U U 

b < o 
q o u 

P o b 
h p a 

< < < 

U n h 

< u o 

< p y 

o a < u 

U H U O 
H O < < 

P < h a 



OVOfSOO^tOvOfSOOTj-O 



m 




IT) O IT) O if) O 

< — it — i cnj cvi oo 



CM 
CNJ 




00 o ~ 8 
04 ^ ^ , <N 

2 s - « a a 3 g $ 3 a 

- N (S _ — 



! a i > w op u 8 J ^ 



;z; u 52 m u 5 




u -3 ca rfi -£ 



3 on 



5 

op w 

ob op 
co < 



lD 



lO 



O 
CNJ 



LO 

CN 



O 

00 



oo 



r*» 31 vri m o 



9 

K £ ^ ^ £ Sa ^gs - 



CM 
00 



vo . oo 
cn ^ r- 



2 oo O 



«T3 _ 

- S N <N ^ 2 S 2 Tr 



B 




I £ 5 * <-> u gp E z 

O ~ u < 0 h 

,u a - — 3 c 3 5 2 

*a 5T m u m u = 



Z 
2 u 

< « -I 



5t u C V 




1 

« £ « ffl =, < CO CO 



l0 



o 



CM 



o 

00 



if) 
en 



00 
CM 



^ O ^ f" 1 O * <N 

—i ^ — 



jbttff jjf 

1 ig s I J 1 2 2 

9 "ag 9.3 1 5 1 1 

2 UfflPQZU&Qffl, 




LT) 



o 

CNJ 



LO 



O 
00 



00 




CN 
CM 



op 

?o a 

, o 

P J op 

-> 9 

o * * 

on ^ o 

H u 

00 00 

*° go 03 
m 

no ^ 

<N 

vo w 

vO W 



- § 3 



vO 

s! 



o 3 



8 w 
00 





S 


0 








s 


1 








s 


0 












l> 








0 w 




0 ^ 


op 



8 "ob 



2 

8 

* Ov S 



o o 



ON J 

On 

00 Q S 

ON w 

go 3 



ON 
ON , 



<0 



i-U ca 



_ 00 



~ CJ 



on 04 



vO 
<N 



~ ob 



00 $3 

^ E 

— W 00 

NO 10 

= *f, 

s 

,— 1 cO 

2 H op 

^ CO 

£ 2 & 

^ O 3 

— n dc 

1—1 o 

— CO 

~ H s 



00 ^ 

o 

2> 



2 J 



o °o 



00 

3 



On u 

a ! 

2 Z & ;i 
Q m 

<N CO 
^ CO £P 

m 3 

.00 
m < — 

- < * 

O 

— op 



1— 1 o 

2^ 



2 w 

— Oh 
On 

1— 1 



CO 
00 

o 

O w 



CJ : 
CO j 

8 I 

— , co Cu 



00 t— 1 



VO 

2 J 



8 

vO 



3 



VO ^ O 

CM CO* 

vo w 00 

~i 00 

_ - -oo 

vO > ^ 

~* 00 

O rV U 

onQ » 

2 , * 

00 J & 

2 * ff 

-H j 

— i3 

vO ^ 00 

«r> ^ w 

~ « 3 

E S « 



vn 



to 



- Q 00 

m 00 

- 0 00 
m 00 

<N « 

>n ^_ co 

2 35 J 



^ op 



00 



CO 



S 00 

— 8 

1—1 c0 

— ' o 
fN Q cO 
D CO 

^^8 

0i cO 

— 00 

s < a 
— 1 00 

00 w ^ 

5 S 

^ £ op 

vO O 

— i-3 00 

VO 

ON 

VO 



o 
cm 



m 

CM 



o 

00 



m 
en 



m 

m 
m 

Si 



G5 

K 



LO 
CM 
CNJ 



fN W 
On 



ON > 

oo | — 1 

OO ^ 
00 

VO 

00 f_ 



< 



2£ 



2> 



3 w 

oo CQ 



fN 

ON 

o 
cs 

00 oo 
O cd 
CN 00 
h W M 

O - 
fN 

NO 

o 

v% 

o 

CM 



2 
3° 



9. 

00 



CM 



n ^ r j 00 



8 

S£ ^ oo 
oo j u 

vo |3 
vo 

00 



oo 



o 



<N 

ON O 

fN 

oo op 

£ ^ 

NO 3 

«— • CU <S 

fN 00 

20 ? 

S < ts 

<N O0 

<N 2 

(N « 

« _ 00 

(N P OO 

~ ^ OO 



8 



o 

<N 

ON 
f<N 

fN 
OO 

a m 

i- co to 

M o oo 

oo : 

04 * ton 
v\ oo d 

CM < 



5* 1 *i 

O « 5" 

Q 00 CQ 

«N w w 

o > on 

fN < IT 

s 

S 

£ " §8 

<N > w 
fN > tj 
VO 00 



00 
fN ft, 



5S 

00 



fN 



fN 
fN 



in oo 
fN W S 

K « 00 
fN OO 

O t! 

On 3 
fN O 00 

00 oo 

t Q 00 
fN w 00 

5 * « 

vo O 00 
fN * 

^ J 8 cV 

!<f-~ 



fN 



^ H fN 

m _ *C 

a, 



fN 



fN ^ 

ON fjQ 



00 



° 

fN 
On 
NO 
fN 

oo c5 

NO OO 
CM 00 

£w « 

fN -5 

no Q * 

VO cd 

fN ^ o 

^ 2 

vo op 

fN r 03 

^ H cj 
nO . H 

<N < O 
<^> CM 

go 

3 

^ rh <-> 
NO 

N u. O 

° a 

vo w 

fN 

ON *-> 

N ^ S 
00 

™ ~ X 

fN 00 

r- > 3 
m oo 

<rv w 
fN ^ 

NO 

On 



1 o < 



s 

M s 

fN M 

O M f9 

oo 3 

r- oo 
fN to w 

fN ^ ^ 

r* u oo 

3- a 

<N W 00 



fS Z °„ fN ' OO 

r: _ 6° vo op 



00 
fN 



fN 



nO fN 

m vo 
o o 



LO 



LO 



o 



LO 

CM 



O 

m 



LO 

oo 



g 

s 
in 

Si 



5 

J* 



CN 




1 

o 

1 



5 & 



m 



LO 



o 

CN 



m 

CN 



o 
on 



m 

CO 



m 
m 

Si 



H 

Q 

rtJ 

n 



r- 



o 



H 
O 
U 

(N ^ § 

oo M n 
N H 
t- W O 

vo *-* < 

N fl 3 J 

*n ^ rh o 
*r ^ O * 

fN 
<N 

fS (^y < 



; e 

J S> J 9 3 5 'J 



oo 



<Q1 



— OO 



<N 




s 



oo 

VO > 
OO 

oo 



I 3 



00 



oo 



o 



"OS 1 
00 oo a 

ON O 

» as 

oo « 
r^. 00 

K < o 



s 



u 

2 1 

oo 

S ^ o 

2 j o 

t— < 00 

2 H 



on 
oo 



° i— i 



O CO ^ pLl ~ 



J CO 
O0 

0>» 

00 ^ 

o\ 

!> 

ON ^ 
T 

0> Q 

m 

ON ^ 

rs ^ 

ON OO 

o^ Q 



o 

i 



LO 



o 



CM 



o 

00 



00 



5 

b 



CO 
CM 



ft 



> 8 

W O 

of 

<§■ 

> o Q 
3d 



^ 00 

3 

Q M 

< 1 

< ; 

3 w 



2*5 



s 

CN 



8 



2 m 



5 



00 

op 



5 



I s 

CM CU 



H 
o 

2 ™ 

- < a 

_ oo 

2< g 

— . 00 
< 00 



00 

**> 1 

<N « op 

-a a 



8 
OD 

£ 

"So 

M < to 

„ u 



o o 

00 

§_ 

^2 

-I 
>¥ 

00 



On 



LD 



O 
CN 



CM 



O 



LO 



Hi 

in 

W 

M 



o 

00 
CNJ 



■if 

E DC 




oo oo o\ a* p - - 



a 1 



On 



OO 



3 

8 - 

VO 



-..8 - 



- < 



00 ^ 

op § 

§ s 

§ 00 



ob r- 



o 

So 

m 
vo 

2 < 



§ 5 

oo in 

oo ~ 

00 tJ- 

3 2 

§ 5? 



m m m c> • 



LO 



O 
CM 



LO 
CNJ 



o 

00 



L0 

00 



I 
4;: 

m 



5 5:i 

m 

Q 

s . 



00 
CvJ 




LO 



LO 



o 

CN] 



LO 
CNJ 



o 

00 



to 
00 




lO 



IT) 



O 
CNJ 



LO 
CSJ 



o 

00 




LO 



o 

Csl 



CNJ 



O 
CO 



LO 

00 



• 



r?<: 



C 



in 

M 



Q §8 
o3 



o3 

u 

w gg 
« $ 

M 00 



0 g, 

00 

W g 

o 



m . «» 

r- 1 op 

5 w 

^_ 00 

m < oo 

so S 

^ a 

oo ^ ^ 

2 ^ » 



" i 2 l 



«n *-< 

a* 

p-, w 

3 s. S 



OO 



On 
00 

vO 
*n 



1 I 



O i 

< I 
Q S 



m on o 



3 j 



s w a 

o oo 

^ 00 



VO 

s 



VO 



00 

>n 

r*» 
<n 

so 

m 

m 

m > 



PQ 



OO 

1 
& 

o 

CD 

o 



00 



8 

Ov 

oo 

J? §8 



On 



oo 



in 
r- 

co 

£ oo 

O 00 

ON ~ 

^ ^ s> 

oo > o 

a 

h W 60 

NO 00 



ON 



ON .> 

oo O* 

z< 



oo 3 



oo d 

^ C/ " 

5 oo r 

S oo 

So » 

00 O 

^ <C ao 

<n ^ ts 



■n 

On 
O 
m 

00 

o 

°a 

»n . « 
o > 
m g 

SO 1 ; 

cn Pi ^ 
o o 



00 



<n ^ oo 

ON , 2 

o 

00 0* 03 

ON Q 

* n s 

On w 

^ 2 ta 
SO S) 

2 Q 

00 

r- 
*n 



<N 

in 
m 

<N O w 

»n oo 
cs go 
^ O w 

oo 



ON oo 



22 >- „ 



c- a 

K <y 2 

vo 05 op 

K o a 

K W ft 

2 > oo 

m on oo 

^ Ph oo 



o 
«n 

ON 

00 

to 
in 
r- 

in ^ 

rn 
in 

fn 

in _ 



a 



3 

00 

oo 



•n 

(N M 

s Q a 

*^ s 
§ ^ J 

?S fa H 
in <* 

oo W H 

■ ^U J 

CM ^ 



OO 

VO 

NO 



if) 



IT) 



o 



CNJ 



o 

CO 



00 



Q 
O 

y i 



5 

hi 



00 
C\J 



m 

in 

o 
m 

< 

in 

22 > 



ON 



in 



& 5 



oo 

s 



eg 

^ J c 

m J 
m r S 

< 00 

..a 

^" 00 

^ > ™ 

oo 
oo 

^ ° a 



m 

9 



8 <« 

m 

^ M 



«n £ oo 

«n » 

3 _ 3 

^ ^ Z 

g . 00 

«n ^ w 

on , w 
in oo 



' 3 



m > 



oo 
m 
r- 
m 



m <n i— « 

o — r- 

oo oo oo 

m mm 




r- m o\ m 
tj- tj- «-> m vo 



LO 



o 



O 

on 



- 236 - 



Table 30: Oligonucleotides used to clone CDR1/2 diversity 
All sequences are 5 f to 3'. 
l)ON_CDlBsp, 30 bases 

5 

AccTc AcTggcTTccgg A 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 



TTcAcTTTcTcT 
10 19 20 21 22 23 24 25 26 27 28 29 30 



2)ON_Brl2, 42 bases 



ssfsi: 



AgAAAcccAcTccAAAcc 
15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

TTTAccAggAgcTTggcg 

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 



20 A A c c c A 
37 38 39 40 41 42 

3)ON_CD2Xba,51bases 



25 ggAAggcAgTgATcTAgA 
4? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 



y. gATAgTg A AgcgAccTTT 

30 



p 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 



if* 

z; AAcggAgTcAgcATA 

O 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 

M< 

35 4)ON_BotXba,23bases 



ggAAggcAgTgATcTAgA 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 



40 



g A T A g 
19 20 21 22 23 
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Table 31: Bridge/Extender Oligonucleotides 

0N_LamlaB7 <rc) GTGCTGACTCAGCCACCCTC . 

ON_Lam2aB7 ( rc ) GCCCTGACTCAGCCTGCCTC . 

0N_Lam3 1 B7 ( r c ) GAGCTGACTCAGG . ACCCTGC 

0N_Lam3 rB7 ( rc ) GAGCTGACTCAGCCACCCTC . 

ON_LamHf lcBrg ( rc ) CCTCGACAGCGAAGTGCACAGAGCGTCTTGACTCAGCC 

ON_LamHf IcExt • CCTCGACAGCGAAGTGCACAGAGCGTCTTG 

ON_LamHf 2b2Brg { rc ) CCTCGACAGCGAAGTGCACAGAGCGCTTTGACTCAGCC 

ON_LamHf 2b2Ext CCTCGACAGCGAAGTGCACAGAGCGCTTTG 

ON_LamHf 2dBrg < rc ) CCTCGACAGCTAAGTGCACAGAGCGCTTTGACTCAGCC 

ON_LamHf 2dExt CCTCGACAGCGAAGTGCACAGAGCGCTTTG 

0N_LamHf31Brg (rc) CCTCGACAGCGAAGTGCACAGAGCGAATTGACTCAGCC 

ON_LamHf 31Ext CCTCGACAGCGAAGTGCACAGAGCGAATTG 

ON JLamHf 3 rBrg ( r c ) CCTCGACAGCGAAGTGCACAGTACGAATTGACTCAGCC 

ON_LamHf 3rExt CCTCGACAGCGAAGTGCACAGTACGAATTG 

ON_lamPlePCR CCTCGACAGCGAAGTGCACAG 

Consensus 
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Table 32: Oligonucleotides used to make SSDNA locally 
double-stranded 

Adapters (8) 

5 H43HF3. 1-02#1 5 ' -cc gtg tat tac tgt gcg aga g-3' 

H43.77. 97. l-03#2 5 1 -ct gtg tat tac tgt gcg aga g-3' 

H43.77.97.323#22 5 1 -cc gta tat tac tgt gcg aaa g-3' 

H43.77.97.330#23 5 ' -ct gtg tat tac tgt gcg a la g-3' 

H43.77.97.439#44 5'-ct gtg tat tac tgt gcg aga c-3' 

10 H43.77.97.551#48 5 1 -cc atg tat tac tgt gcg aga c-3* 



b 
o 

m 
m 

\i 

4;: 

rv 

yl 
C 
N 
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Table 33: Bridge/extender pairs 

Bridges (2) 
H4 3.XABrl 

5 5 'ggtgtagtgaTCTAGtgacaactctaagaatactctctacttgcagatgaacagC 
TTtAGggctgaggacaCTGCAGtctactattgtgcgaga-3 ■ 

H4 3.XABr2 

5 'ggtgtagtgaTCTAGtgacaactctaagaatactctctacttgcagatgaacagC 
10 TTtAGggctgaggacaCTGCAGtctactattgtgcgaaa-3 • 

Extender 
H4 3.XAExt 

5 ' ATAgTAgAcTgcAgTgTccTcAgcccTTAAgcTgTTcATcTgcAAgTAgAgAgTA 
15 TTcTTAgAgTTgTcTcTAgATcAcTAcAcc-3 ■ 



N 

w 
G 

HI 
Si 

SI 

N 
£*? : * 
RJ 
VI 
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Table 34: PCR primers 
Primers 

H43 . XAPCR2 gactgggTgTAgTgATcTAg 
Hucmnest cttttctttgttgccgttggggtg 
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Table 35: PCR program for amplification of 
heavy chain CDR3 DNA 

95 degrees C 5 minutes 

5 

95 degrees C 20 seconds 

60 degrees C 30 seconds repeat 20x 

72 degrees C 1 minute 

10 72 degrees C 7 minutes 
4 degrees C hold 

Reagents (100 ul reaction) : 

Template 5ul ligation mix 

15 lOx PCR buffer lx 

Taq 5U 

dNTPs 200 uM each 

MgCl 2 2mM 
H43.XAPCR2-biotin 400 nM 

N : 20 Hucmnest 200 nM 

fc,V 

m 
m 

si 



m 

U 
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Table 36: Annotated sequence of CJR DY3F7 (CJR-A05) 10251 bases 



H 
w 

m 

U 

p 

m 
p 



10 



15 



20 



25 



30 



35 



40 . 



45 



50 



55 



60 



Non-cutters 

Bell Tgatca 
BstZ17I GTAtac 
Fsel GGCCGGcc 
Pmel GTTTaaac 
RsrII CGgwccg 
Sgfl GCGATcgc 
StuI AGGcct 

cutters 



BsiWI Cgtacg 
Btrl CACgtg 
Hpal GTTaac 
Pmll CACgtg 
Sapl GCTCTTC 
SgrAI CRccggyg 
Xmal Cccggg 



BssSI Cacgag 
EcoRV GATatc 
Mlul Acgcgt 
PpuMI RGgwccy 
SexAI Accwggt 
SphI GCATGc 



Enzymes that cut from 1 to 4 times and other features 



End of genes II and X 




oz y 




otart gene V 




Q / "5 

0 4 J 




BsrGI Tgtaca 


1 


1021 




BspMI Nnnnnnnnngcaggt 




1 1 A A 


C QQT 

d y y / 


- - ALCICjCNNNNn 


1 
1 


Zzo 1 




End of gene V 




i 1 n £ 
ilUu 




otart gene VII 




1 lUo 




BsaBl GATNNnnatc 


z 


1 14 y 


j y 0 / 


Start gene IX 




1 TOO 

lzUo 




End gene VII 




1011 
1 z ± ± 




SnaBI TACgta 


2 


1268 


/ 1J J 


BspHI Tcatga 


6 


lz y y 


bub j 


Start gene VIII 




1301 




Tp ______ XV 

bna gene IX 




1 -JA/ 

1 JU4 




l-t J . T7TTT 

End gene VIII 




icon 

1522 




Start gene III 




157 8 




EagI Cggccg 


0 
z 


lbJU 


0 y uo 


Xoal Tctaga 


z 


1 r /| t 

1d4 j 


O A *3 

0 4 Jo 


KasI Ggcgcc 


A 
4 


i £ n 

1 DO U 


Q "7 9 A 
0 / Z 4 


BsmI GAATGCN 


2 


17 69 


n a c 

9065 


BseRI GAbGAGNiNNNNNNNNN 


2 


zU 


0 jib 


-"— NNnnnnnnnnctcct c 


Z 


t cm 
/ duj 


Q £ O O 

0 OZ O 


AlwNl L Ab N IN IN C L g 


j 


z z ± u 


0 u / z 


ospui Hicgau 


9 
z 


Z 0 Z U 


QQ p O 

y 0 0 j 


Ndel CAtatg 


3 


2716 


3796 


End gene III 




2846 




Start gene VI 




2848 




Afel AGCgct 


1 


3032 




End gene VI 




3187 




Start gene I 




3189 




Earl CTCTTCNnnn 


2 


4067 


9274 


-"- Nnnnngaagag 


2 


6126 


8953 


Pad TTAATtaa 


1 


4125 




Start gene IV 




4213 




End gene I 




4235 




BsmFI Nnnnnnnnnnnnnnngtccc 


2 


5068 


9515 


MscI TGGcca 


3 


5073 


7597 


Psil TTAtaa 


2 


5349 


5837 


End gene IV 




5493 




Start ori 




5494 




NgoMIV Gccggc 


3 


5606 


8213 


Banll GRGCYc 


4 


5636 


8080 


Drain CACNNNgtg 


1 


5709 




DrdI GACNNNNnngtc 


1 


5752 




Aval Cycgrg 


2 


5818 


7240 


PvuII CAGctg 


1 


5953 





9039 9120 



9160 



9315 
8606 



8889 
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IBsmBI CGTCTCNnnnn 


3 


5964 


8585 


!End ori region 




5993 




BamHI Ggatcc 


1 


5994 




Hindlll Aagctt 


3 


6000 


7147 


BciVI GTATCCNNNNNN 


1 


6077 




Start bla 




6138 




Eco57I CTGAAG 


2 


6238 


7716 


Spel Actagt 


1 


6257 




Bcgl gcannnnnntcg 


1 


6398 




Seal AGTact 


1 


6442 




Pvul CGATcg 


1 


6553 




Fspl TGCgca 


1 


6700 




Bgll GCCNNNNnggc 


3 


6801 


8208 


Bsal GGTCTCNnnnn 


1 


6853 




Ahdl GACNNNnngtc 


1 


6920 




Eamll05I GACNNNnngtc 


1 


6920 




End bla 




6998 




AccI GTmkac 


2 


7153 


8048 


Hindi GTYrac 


1 


7153 




Sail Gtcgac 


1 


7153 




Xhol Ctcgag 


1 


7240 




Start PlacZ region 




7246 




End PlacZ region 




7381 




PflMI CCANNNNntgg 


1 


7382 




RBS1 




7405 




start M13-iii signal seq for 


LC 


7418 




ApaLI Gtgcac 


1 


7470 




end M13-iii signal seq 




7471 




Start light chain kappa L20: 


JK1 


7472 




PflFI GACNnngtc 


3 


7489 


8705 


Sbfl CCTGCAgg 


1 


7542 




PstI CTGCAg 


1 


7543 




Kpnl GGTACc 


1 


7581 




Xcml CCANNNNNnnnntgg 


2 


7585 


9215 


Nsil ATGCAt 


2 


7626 


9503 


Bsgl ctgeae 


1 


7809 




Bbsl gtcttc 


2 


7820 


8616 


BlpI GCtnagc 


1 


8017 




Espl GCtnagc 


1 


8017 




Eco0109I RGgnccy 


2 


8073 


8605 


Ecll36I GAGctc 


1 


8080 




Sad GAGCTc 


1 


8080 




End light chain 




8122 




AscI GGcgcgcc 


1 


8126 




BssHII Gcgcgc 


1 


8127 




RBS2 




8147 




Sfil GGCCNNNNnggcc 


1 


8207 




Ncol Ccatgg 


1 


8218 




Start 3-23, FR1 




8226 




Mfel Caattg 


1 


8232 




BspEI Tccgga 


1 


8298 




Start CDR1 




8316 




Statt FR2 




8331 




BstXI CCANNNNNntgg 


2 


8339 


8812 


EcoNI CCTNNnnnagg 


2 


8346 


8675 


Start FR3 




8373 




Xbal Tctaga 


2 


8436 


1643 


Aflll Cttaag 


1 


8480 




Start CDR3 




8520 




Aatll GACGTc 


1 


8556 




Start FR4 




8562 




PshAI GACNNnngtc 


2 


8573 


9231 



IBstEII Ggtnacc 


1 


8579 


! Start CHI 




8595 


!ApaI GGGCCc 


1 


8606 


!Bspl20I Gggccc 


1 


8606 


! PspOMI Gggccc 


1 


8606 


!AgeI Accggt 


1 


8699 


!Bsu36I CCtnagg 


2 


8770 9509 


!End of CHI 




8903 


!NotI GCggccgc 


1 


8904 


! Start His6 tag 




8913 


! Start cMyc tag 




8931 


! Amber codon 




8982 


!NheI Gctagc 


1 


8985 


! Start M13 III Domain 3 




8997 


INruI TCGcga 


1 


9106 


IBstBI TTcgaa 


1 


9197 


!EcoRI Gaattc 


1 


9200 


!XcmI CCANNNNNnnnntgg 


1 


9215 


IBstAPI GCANNNNntgc 


1 


9337 


ISacII CCGCgg 


1 


9365 


! End Illstump anchor 




Q /I C C 

y 4 DO 


!AvrII Cctagg 


1 


9462 


!trp terminator 




9470 


!SwaI ATTTaaat 


1 


9784 


! Start gene II 




9850 


IBglll Agatct 


1 


9936 



1 aat get act act att agt 

gec 

! gene ii continued 

4 9 cca aat gaa aat ata get 

gta 

97 tct aat ggt caa act aaa 

act 

145 gtt aTa tgg aat gaa act 

tta 

193 aaa cat gtt gag eta cag 

cca 

241 tec gca aaa atg acc tct 

tct 

289 aat cct gac ctg ttg gag 

get 

337 cga att aaa acg cga tat 

ctt 

385 ttt gat gca ate cgc ttt 

gac 

433 ctg att ttt gat tta tgg 

gca 

481 ttt gag ggg gat tea ATG 

gac 

! Start gene x, ii continues 

529 get ate cag tct aaa cat ttt act att acc ccc tct ggc aaa act 

tct 

577 ttt gca aaa gee tct cgc tat ttt ggt ttt tat cgt cgt ctg gta 

aac 

625 gag ggt tat gat agt gtt get ctt act atg cct cgt aat tec ttt 

tgg 

673 cgt tat gta tct gca tta gtt gaa tgt ggt att cct aaa tct caa 

ctg 

721 atg aat ctt tct acc tgt aat aat gtt gtt ccg tta gtt cgt ttt 

att 



aga att gat gee acc ttt tea get cgc 

aaa cag gtt att gac cat ttg cga aat 

tct act cgt teg cag aat tgg gaa tea 

tec aga cac cgt act tta gtt gca tat 

caT TaT att cag caa tta age tct aag 

tat caa aag gag caa tta aag gta etc 

ttt get tec ggt ctg gtt cgc ttt gaa 

ttg aag tct ttc ggg ctt cct ctt aat 

get tct gac tat aat agt cag ggt aaa 

tea ttc teg ttt tct gaa ctg ttt aaa 

aat att tat gac gat tec gca gta ttg 



769 aac gta gat ttt tct tec caa cgt cct gac tgg tat aat gag cca 

gtt 

817 ctt aaa ate gca TAA 
! End X & II 

832 ggtaattca ca 

! Ml E5 Q10 T15 

843 ATG att aaa gtt gaa att aaa cca tct caa gec caa ttt act act 

cgt 

Start gene V 

S17 S20 P25 E30 

891 tct ggt gtt tct cgt cag ggc aag cct tat tea ctg aat gag cag 

ctt 
I 

! V35 E40 V45 

939 tgt tac gtt gat ttg ggt aat gaa tat ccg gtt ctt gtc aag att 

act 
i 

! D50 A55 L60 

987 ctt gat gaa ggt cag cca gee tat gcg cct ggt cTG TAC Ace gtt 

cat 

! BsrGI . . . 

! L65 V70 S75 

R80 

1035 ctg tec tct ttc aaa gtt ggt cag ttc ggt tec ctt atg att gac 
cgt 
i 

! P85 K87 end of V 

1083 ctg cgc etc gtt ccg get aag TAA C 

I 

1108 ATG gag cag gtc gcg gat ttc gac aca att tat cag gcg atg 
! Start gene VII 

I 

1150 ata caa ate tec gtt gta ctt tgt ttc gcg ctt ggt ata ate 

VII and IX overlap. 

S2 V3 L4 V5 S10 

1192 get ggg ggt caa agA TGA gt gtt tta gtg tat tct ttT gee tct ttc 
gtt 

End VII 
I start IX 

L13 W15 G20 T25 

E29 

1242 tta ggt tgg tgc ctt cgt agt ggc att acg tat ttt acc cgt tta 
atg gaa 
I 

1293 act tec tc 

i 

! .... stop of IX, IX and VIII overlap by four bases 

1301 ATG aaa aag tct tta gtc etc aaa gee tct gta gee gtt get acc 
etc 

! Start signal sequence of viii. 

i 

134 9 gtt ccg atg ctg tct ttc get get gag ggt gac gat ccc gca aaa 
gcg 

! mature VIII > 

1397 gee ttt aac tec ctg caa gee tea gcg acc gaa tat ate ggt tat 
gcg 

1445 tgg gcg atg gtt gtt gtc att 

1466 gtc ggc gca act ate ggt ate aag ctg ttt aag 
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Pi 
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35 
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55 



60 



! bases 1499-1539 are probable promoter for iii 

1499 aaa ttc acc teg aaa gca ! 1515 

! -35 .. 

1 

1517 age tga taaaccgat acaattaaag gctccttttg 

! " ' -10 

! 

1552 gagecttttt ttt GGAGAt ttt ! S.D. uppercase, there may be 9 Ts 

1 

! < hi signal sequence 



! MKKLLEAI PLVVPF 

1574 caac GTG aaa aaa tta tta ttc gca att cct tta gtt gtt cct ttc ! 
1620 
1 

H L D G A 
1 caT CTA GAc ggc gee 
Xbal 



Domain 1 



1656 



AAA ACt 
1 
1 

V 

1734 t 
ggc gtt 



s 


G 


A 


A 


E 


tct 


ggc 


gCG 


GCC 


Gaa 






EagI . . 




A 


E 


T 


V 


E 


get 


gaa 


act 


gtt 


gaa 


S 


H 


T 


E 


I 


Tec 


cat 


aca 


gaa 


aa t 


D 


R 


Y 


A 


N 


gat 


cgt 


tac 


get 


aac 


V 


C 


T 


G 


D 


gtt 


tgt 


act 


ggt 


GAC 


L 


A 


I 


P 


E 


Ctt 


get 


ate 


cct 


gaa 


G 


G 


G 


S 


E 


ggt 


ggt 


ggc 


tct 


gag 


G 


G 


G 


S 


E 



N V 



W K D 
TGG AAA GAC 

W N A 
tgG AAT GCt 

BsmI .... 
G T W 



cct att 



LI linker 
E 

1857 gag 

I 

! EGGGSEGGGT 
1887 gag ggt ggc ggt tct gag ggt ggc ggt act 

I 

! Domain 2 

1917 aaa cct cct gag tac ggt gat aca cct att ccg ggc tat act tat 
ate aac 

1968 cct etc gac ggc act tat ccg cct ggt act gag caa aac ccc get 
aat cct 

2019 aat cct tct ctt GAG GAG tct cag cct ctt aat act ttc atg ttt 
cag aat 

! BseRI . . 

2070 aat agg ttc cga aat agg cag ggg gca tta act gtt tat acg ggc 
act 

2118 gtt act caa ggc act gac ccc gtt aaa act tat tac cag tac act 
cct 
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2166 gta tea tea aaa gee atg tat gac get tac tgg aac ggt aaa ttC 
AGA 
j 

AlwNI 

2214 GAC TGc get ttc cat tct ggc ttt aat gaG gat TTa ttT gtt tgt 
gaa 

! AlwNI 

2262 tat caa ggc caa teg tct gac ctg cct caa cct cct gtc aat get 

I 

2307 ggc ggc ggc tct 
! start L2 

2319 ggt ggt ggt tct 
2331 ggt ggc ggc tct 

234 3 gag ggt ggt ggc tct gag gga ggc ggt tec 
2373 ggt ggt ggc tct ggt ! end L2 

! Many published sequences of M13-derived phage have a longer linker 
! than shown here by repeats of the EGGGS motif two more times. 

! Domain 3 

! SG DFDYEKMANANKGA 

2388 tec ggt gat ttt gat tat gaa aag atg gca aac get aat aag ggg 
get 
i 

! MTENADENALQS DAKG 

2436 atg ace gaa aat gee gat gaa aac gcg eta cag tct gac get aaa 

ggc 
t 

! KLDSVATDYGAAMDGF 

2484 aaa ctt gat tct gtc get act gat tac ggt get get ate gat ggt 
ttc 
! 

! IGDVSGLANGNGATGD 

2532 att ggt gac gtt tec ggc ctt get aat ggt aat ggt get act ggt 
gat 
i 

! FAGSNSQMAQVGDGDN 

2580 ttt get ggc tct aat tec caa atg get caa gtc ggt gac ggt gat 
aat 
! 

! SPLMNNFRQYLPSLPQ 

2628 tea cct tta atg aat aat ttc cgt caa tat tta cct tec etc cct 
caa 
j 

! SVECRPFVFGAGKPYE 

2676 teg gtt gaa tgt cgc cct ttt gtc ttt Ggc get ggt aaa cca tat 
gaa 
j 

! FS I DCDKINLFR 

2724 ttt tct att gat tgt gac aaa ata aac tta ttc cgt 
! End Domain 3 

! GVFAFLLYVAT FMYV 

F140 

2760 ggt gtc ttt gcg ttt ctt tta tat gtt gee ace ttt atg tat gta 
ttt 

! start transmembrane segment 

j 

! S T F A N I L 

2808 tct acg ttt get aac ata ctg 
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I 

! R N K E S 

2829 cgt aat aag gag tct TAA ! stop of iii 
Intracellular anchor. 

Ml P2 V L L5 G I P L L10 L R F L 
G15 

2847 tc ATG cca gtt ctt ttg ggt att ccg tta tta ttg cgt ttc etc 
ggt 

! Start VI 

i 

2894 ttc ctt ctg gta act ttg ttc ggc tat ctg ctt act ttt ctt aaa 
aag 

2942 ggc- ttc ggt aag ata get att get att tea ttg ttt ctt get ctt 
att 

2990 att ggg ctt aac tea att ctt gtg ggt tat etc tct gat att age 
get 

3038 caa tta ccc tct gac ttt gtt cag ggt gtt cag tta att etc ccg 
tct 

3086 aat gcg ctt ccc tgt ttt tat gtt att etc tct gta aag get get 
att 

3134 ttc att ttt gac gtt aaa caa aaa ate gtt tct tat ttg gat tgg 
gat 
j 

! Ml A2 V3 F5 L10 G13 

3182 aaa TAA t ATG get gtt tat ttt gta act ggc aaa tta ggc tct gga 
end VI Start gene I 
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I 

att 
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i 
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G 
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aat 


L 
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D 

gat 
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tta 


R 
agg 
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ctt 


Q 
caa 


N 
aac 


L 
etc 


3318 
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p 

ccg 


Q 
caa 


V 
gtc 


G 

ggg 


R 
agg 


F 

ttc 


A 
get 


K 
aaa 


T 
acg 


P 

cct 


R 
cgc 


V 
gtt 


L 
ctt 
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aga 


I 

ata 


i 

3363 
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P 
ccg 


D 
gat 


K 
aag 


p 

cct 


S 
tct 


I 

ata 


S 
tct 


D 
gat 


L 
ttg 


L 
ctt 


A 
get 


I 

att 


G 

ggg 


R 
cgc 


G 
ggt 


i 

3408 
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N 
aat 


D 
gat 


S 
tec 


Y 
tac 


D 
gat 


E 
gaa 


N 
aat 


K 
aaa 


N 
aac 


G 
ggc 


L 
ttg 


L 
ctt 


V 

gtt 


L 
etc 


D 
gat 


i 

3453 
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E 
gag 


C 
tgc 


G 
ggt 


T 
act 


W 
tgg 


F 

ttt 


N 
aat 


T 

acc 


R 
cgt 


S 
tct 


w 

tgg 


N 
aat 


D 
gat 


K 
aag 


E 
gaa 


3498 

! 


R 
aga 


Q 
cag 


P 
ccg 


I 

att 


I 

att 


D 
gat 


W 

tgg 


F 
ttt 


L 
eta 


H 
cat 


A 
get 


R 
cgt 


K 
aaa 


L 
tta 


G 
gga 


! 

3543 


W 

tgg 


D 
gat 


I 

att 


I 

att 


F 
ttt 


L 
ctt 


V 
gtt 


Q 
cag 


D 
gac 


L 
tta 


S 
tct 


I 

att 


V 

gtt 


D 
gat 


K 
aaa 


i 

3588 

t 


Q 
cag 


A 
gcg 


R 
cgt 


S 
tct 


A 
gca 


L 
tta 


A 
get 


E 
gaa 


H 
cat 


V 
gtt 


V 
gtt 


Y 
tat 


c 

tgt 


R 
cgt 


R 
cgt 


i 

3633 
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L 
ctg 


D 
gac 


R 
aga 


I 

att 


T 
act 


L 
tta 


P 
cct 


F 
ttt 


V 
gtc 


G 
ggt 


T 

act 


L 
tta 


Y 
tat 


S 
tct 


L 
ctt 


i 

3678 
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I 

att 


T 
act 


G 
ggc 


S 
teg 


K 
aaa 


M 
atg 


P 

cct 


L 
ctg 


P 
cct 


K 
aaa 


L 
tta 


H 
cat 


V 
gtt 


G 
ggc 


V 
gtt 


i 


V 


K 


Y 


G 


D 


S 


Q 


L 


S 


P 


T 


V 


E 


R 


W 
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3723 


gtt 


aaa 


tat 


ggc 


gat 


tct 


caa 


tta 


age 


cct 


act 


gtt 


gag 


cgt 


tgg 


3768 


L 

ctt 


Y 
tat 


T 
act 


G 
ggt 


K 
aag 


N 
aat 


L 
ttg 


Y 
tat 


N 
aac 


A 
gca 


Y 
tat 


D 
gat 


T 
act 


K 
aaa 


Q 
cag 


3813 


A 
get 


F 
ttt 


S 
tct 


S 
agt 


N 
aat 


Y 
tat 


D 
gat 


S 
tec 


G 
ggt 


V 
gtt 


Y 
tat 


S 
tct 


Y 
tat 


L 
tta 


T 
acg 


3858 


P 
cct 


Y 
tat 


L 
tta 


S 
tea 


H 
cac 


G 
ggt 


R 
egg 


Y 
tat 


F 
ttc 


K 
aaa 


P 
cca 


L 
tta 


N 
aat 


L 
tta 


G 
ggt 


3903 


Q 
cag 


K 
aag 


M 
atg 


K 
aaa 


L 
tta 


T 
act 


K 
aaa 


I 

ata 


Y 
tat 


L 

ttg 


K 
aaa 


K 
aag 


F 

ttt 


S 
tct 


R 
cgc 


3948 


V 
gtt 


L 
ctt 


C 
tgt 


L 
ctt 


A 
gcg 


I 

att 


G 
gga 


F 
ttt 


A 
gca 


S 
tea 


A 
gca 


F 
ttt 


T 
aca 


Y 
tat 


S 
agt 


3993 


Y 
tat 


I 

ata 


T 
acc 


Q 
caa 


P 
cct 


K 
aag 


P 
ccg 


E 
gag 


V 
gtt 


K 
aaa 


K 
aag 


V 
gta 


V 
gtc 


S 
tct 


Q 
cag 


4038 


T 
acc 


Y 
tat 


D 
gat 


F 

ttt 


D 
gat 


K 
aaa 


F 
ttc 


T 
act 


I 

att 


D 
gac 


S 
tct 


S 
tct 


Q 
cag 


R 
cgt 


L 

ctt 


4083 


N 
aat 


L 
eta 


S 
age 


Y 
tat 


R 
cgc 


Y 
tat 


V 

gtt 


F 
ttc 


K 
aag 


D 
gat 


S 
tct 


K 
aag 


G 
gga 


K 
aaa 


L 
TTA 
Pacl 


4128 


I 

ATT 


N 
AAt 


S 
age 


D 
gac 


D 
gat 


L 
tta 


Q 
cag 


K 
aag 


Q 
caa 


G 
ggt 


Y 
tat 


S 
tea 


L 

etc 


T 
aca 


Y 
tat 



Pacl 



! il DLCTVS I KKGNSNE 

! iv Ml K 

4173 att gat tta tgt act gtt tec att aaa aaa ggt aat tea aAT Gaa 
! Start 
IV 

! i I V K C N .End of I 

! iv L3 L N5 ' V 17 N F V10 

4218 att gtt aaa tgt aat TAA T TTT GTT 
! IV continued 

4243 ttc ttg atg ttt gtt tea tea tct tct ttt get cag gta att gaa 
atg 

4291 aat aat teg cct ctg cgc gat ttt gta act tgg tat tea aag caa 
tea 

4339 ggc gaa tec gtt att gtt tct ccc gat gta aaa ggt act gtt act 
gta 

4387 tat tea tct gac gtt aaa cct gaa aat eta cgc aat ttc ttt att 
tct 

4435 gtt tta cgt gcA aat aat ttt gat atg gtA ggt tcT aAC cct tec 
atT 

4483 att cag aag tat aat cca aac aat cag gat tat att gat gaa ttg 
cca 

4531 tea tct gat aat cag gaa tat gat gat aat tec get cct tct ggt 
ggt 

4579 ttc ttt gtt ccg caa aat gat aat gtt act caa act ttt aaa att 
aat 

4 627 aac gtt egg gca aag gat tta ata cga gtt gtc gaa ttg ttt gta 
aag 

4675 tct aat act tct aaa tec tea aat gta tta tct att gac ggc tct 
aat 

4723 eta tta gtt gtt agt gcT cct aaa gat att tta gat aac ctt cct 
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m 



4771 


ttc 


ctt 


tcA 


act 


gtt 


gat 


ttg 


cca 


act 


gac 


cag 


ata 


ggt . 


























4819 


ttg 


ata 


ttt 


gag 


gtt 


cag 


caa 


ggt 


gat 


get 


tta 


gat 


get 


























4867 


get 


ggc 


tct 


cag 


cgt 


ggc 


act 


gtt 


gca 


ggc 


ggt 


gtt 


cgc 


























4915 


etc 


ace 


tct 


gtt 


tta 


tct 


tct 


get 


ggt 


ggt 


teg 


ttc 


aat 


























4963 


ggc 


gat 


gtt 


tta 


ggg 


eta 


tea 


gtt 


cgc 


gca 


tta 


aag 


cat 


























5011 


tea 


aaa 


ata 


ttg 


tct 


gtg 


cca 


cgt 


att 


ctt 


acg 


ctt 


aag 


























5059 


ggt 


tct 


ate 


tct 


gtT 


GGC 


CAg 


aat 


gtc 


cct 


ttt 


att 


gtg 


























i 










MscI . . 














5107 


act 


ggt 


gaa 


tct 


gee 


aat 


gta 


aat 


aat 


cca 


ttt 


cag 


cgt 


























5155 


caa 


aat 


gta 


ggt 


att 


tec 


atg 


age 


gtt 


ttt 


cct 


gtt 


ggc 


























5203 


ggt 


aat 


att 


gtt 


ctg 


gat 


att 


acc 


age 


aag 


gee 


gat 


tct 


























5251 


tct 


act 


cag 


gca 


agt 


gat 


gtt 


att 


act 


aat 


caa 


aga 


aca 


























5299 


acg 


gtt 


aat 


ttg 


cgt 


gat 


gga 


cag 


act 


ctt 


tta 


etc 


act 


























5347 


gat 


tat 


aaa 


aac 


act 


tct 


caG 


gat 


tct 


ggc 


gta 


ccg 


aaa 


























5395 


ate 


cct 


tta 


ate 


ggc 


etc 


ctg 


ttt 


age 


tec 


cgc 


tct 


gag 


























5443 


gaa 


age 


acg 


tta 


tac 


gtg 


etc 


gtc 


aaa 


gca 


acc 


ata 



10 



15 



20 

^ 25 

m 

Nj 30 

ctg 

s 54 91 TAG eggegcatt 

y, ! End IV 

35 5503 aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca 

gcgccctagc 

5563 gcccgctcct ttegctttet tcccttcctt tctcgccacg ttcGCCGGCt 
ttccccgtca 

0 ! NgoMI. 

jaj. 40 5623 agctctaaat cgggggctcc ctttagggtt ccgatttagt getttaegge 

acctcgaccc 

5683 caaaaaactt gatttgggtg atggttCACG TAGTGggcca tcgccctgat 
agacggtttt 

! Drain .... 

45 5743 tcgccctttG ACGTTGGAGT Ccacgttctt taatagtgga ctcttgttcc 

aaactggaac 

! DrdI 

5803 aacactcaac cctatctcgg gctattcttt tgatttataa gggattttgc 
egatttegga 

50 5863 accaccatca aacaggattt tcgcctgctg gggcaaacca gcgtggaccg 

ettgetgeaa 

5923 ctctctcagg gecaggeggt gaagggcaat CAGCTGttgc cCGTCTCact 
ggtgaaaaga 

! PvuII. BsmBI. 

55 5983 aaaaccaccc tGGATCC AAGCTT 

! BamHI Hindlll (1/2) 

! Insert carrying bla gene 

6006 gcaggtg gcacttttcg gggaaatgtg cgcggaaccc 
6043 ctatttgttt atttttctaa atacattcaa atatGTATCC gctcatgaga 
60 caataaccct 

! BciVI 
6103 gataaatget tcaataatat tgaaaaAGGA AGAgt 
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! RBS . ? . . . 

! Start bla gene 

6138 ATG agt att caa cat ttc cgt gtc gcc ctt att ccc ttt ttt gcg 
gca ttt 

6189 tgc ctt cct gtt ttt get cac cca gaa acg ctg gtg aaa gta aaa 
gat get 

6240 gaa gat cag ttg ggC gcA CTA GTg ggt tac ate gaa ctg gat etc 
aac age 

! Spel 

! ApaLI & BssSI Removed 

6291 ggt aag ate ctt gag agt ttt cgc ccc gaa gaa cgt ttt cca atg 
atg age 

6342 act ttt aaa gtt ctg eta tgt GGC GcG Gta tta tec cgt att gac 
gcc ggg 

6393 caa gaG CAA CTC GGT CGc cgC ATA cAC tat tct cag aat gac ttg 
gtt gAG 

! Bcgl 

Seal 

6444 TAC Tea cca gtc aca gaa aag cat ctt acg gat ggc atg aca gta 
aga gaa 
! Seal. 

6495 tta tgc agt get gcc ata acc atg agt gat aac act gcg gcc aac 
tta ctt 

6546 ctg aca aCG ATC Gga gga ccg aag gag eta acc get ttt ttg cac 
aac atg 

! Pvul 

6597 ggg gat cat gta act cgc ctt gat cgt tgg gaa ccg gag ctg aat 
gaa gcc 

6648 ata cca aac gac gag cgt gac acc acg atg cct gta gca atg Gca 
aca acg 

6699 tTG CGC Aaa eta tta act ggc gaa eta ctt act eta get tec egg 
caa caa 

! Fspl 

j 

6750 tta ata gac tgg atg gag gcg gat aaa gtt gca gga cca ctt ctg 
cgc teg 

6801 GCC ctt ccG GCt ggc tgg ttt att get gat aaa tct gga gcc ggt 
gag cgt 

! Bgll 

6852 gGG TCT Cgc ggt ate att gca gca ctg ggg cca gat ggt aag ccc 
tec cgt 

! Bsal 

6903 ate gta gtt ate tac acG ACg ggg aGT Cag gca act atg gat gaa 
cga aat 

! Ahdl 

6954 aga cag ate get gag ata ggt gcc tea ctg att aag cat tgg TAA 
ctgt 

! stop 

7003 cagaccaagt ttactcatat atactttaga ttgatttaaa acttcatttt 
taatttaaaa 

7063 ggatctaggt gaagatcctt tttgataatc tcatgaccaa aatcccttaa 
cgtgagtttt 

7123 cgttccactg taegtaagae cccc 

7147 AAGCTT GTCGAC tgaa tggcgaatgg cgctttgcct 
! Hindlll Sail.. 

! (2/2) Hindi 

7183 ggtttccggc accagaagcg gtgccggaaa gctggctgga gtgegatett 

Start of Fab-display cassette, the Fab DSR-A05, selected for 
binding to a protein antigen. 

7233 CCTGAcG CTCGAG 
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xBsu36I Xhol. . 
PlacZ promoter is in the following block 

7246 cgcaacgc aattaatgtg agttagctca 

7274 ctcattaggc accccaggct ttacacttta tgcttccggc tcgtatgttg 
7324 tgtggaattg tgagcggata acaatttcac acaggaaaca gctatgacca 
7374 tgattacgCC AagcttTGGa gccttttttt tggagatttt caac 

PflMI 

Hind3. (there are 3) 



! Gene iii signal sequence: 
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7463 
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cac 
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Caa 


Start light 

D I Q 
aac ate caq 


chain (L20:JK1) 

M T Q S 
ata acc caq tct 


P 

cca 


A 
gee 










ApaLI . . . 

Seauence supplied bv extender. 
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Bsgl . . . 
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Espl 




M: 



k ! 



if! 



10 



15 



20 



25 



30 



35 



40 



8057 



8102 



8126 



8160 



8205 



8250 



8295 



BstXI . . . 
i 

I 

> 
i 
j 

8340 
! BstXI. 
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CEVTHQGLSSPVTKS 
tgc gaa gtc acc cat cag ggc ctG AGC TCg ccc gtc aca aag age 

Sad .... 

F N R G E C 
ttc aac agg gga gag tgt taa taa 

GGCGCG CCaattctat ttcaaGGAGA cagtcata 
AscI RBS2. 

PelB signal sequence (22 codons) > 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
MKYLLPTAAAGLLLL 
atg aaa tac eta ttg cct acg gca gee get gga ttg tta tta etc 

...PelB signal > Start VH, FR1 > 

16 17 18 19 20 21 22 23 24 25 26 27 
AAQPAMAEVQLL 
gcG GCC cag ccG GCC atg gee gaa gtt CAA TTG tta 

Sfil Mfel... 

Ncol. . . . 

31 32 33 34 35 36 37 38 39 40 41 42 
GGLVQPGGSLRL 
ggc ggt ctt gtt cag cct ggt ggt tct tta cgt ctt 

. . . FR1 > CDR1 > FR2 

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
ASGFTFSTYEMRWVR 
get TCC GGA ttc act ttc tct act tac gag atg cgt tgg gtt cgC 
BspEI . . 



28 29 30 
E S G 
gag tct ggt 



43 44 45 
S C A 
tct tgc get 



FR2- 



-> CDR2 



61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
QAPGKGLEWVSYIAP 
CAa get ccT GGt aaa ggt ttg gag tgg gtt tct tat ate get cct 



. CDR2- 



-> FR3- 



45 



8385 



76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
SGGDTAYADSVKGRF 
tct ggt ggc gat act get tat get gac tec gtt aaa ggt cgc ttc 



! 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 

! TISRDNSKNTLYLQM 
50 8430 act ate TCT AGA gac aac tct aag aat act etc tac ttg cag atg 

Xbal. 

Supplied by extender 



55 



60 



8475 



FR3 > 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 

NSLRAEDTAVYYCAR 
aac agC TTA AGg get gag gac act gca gtc tac tat tgt gcg agg 
Aflll... 

from extender > 



FR4--> 



8520 



8565 



CDR3 > 

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 

RLDGYISYYYGMD VW 
agg etc gat ggc tat att tec tac tac tac ggt atg GAC GTC tgg 

Aatll. . 

136 137 138 139 140 141 142 143 144 145 

GQGTTVTVSS 
ggc caa ggg ace acG GTC ACC gtc tea age 
BstEII... 



j 
I 

8595 
tec 
i 
j 

8640 
aag 
t 

j 

8685 
gee 



CHI 

A 
gee 

K 

' aag 

D 
gac 



of IgGl 

S T K 
tec ace aag 



S T S 
age ace tct 



Y F P 
tac ttc ccc 



GPSVFPLAPSS 
ggc cca teg gtc ttc ccc ctg gca ccc tec 

GGTAALGCLVK 
ggg ggc aca gcg gee ctg ggc tgc ctg gtc 

EPVTVSWNSGA 
gaa ccg gtg acg gtg teg tgg aac tea ggc 



i 

8730 
TCA 
j 

Bsu36I 

I 

i 

8775 
age 

Bsu36I. 



8820 



age 
i 



8865 
GCC 



L 
ctg 



G 
GGa 



L 
ttg 

N 
aac 



T S G 
ace age ggc 



L Y S 
etc tac tec 



G T Q 
ggc ace cag 



T K V 
acc aag gtg 



VHTFPAVLQSS 
gtc cac aec ttc ccg get gtc eta cag tCC 



LSSVVTVPSSS 
etc age age gta gtg acc gtg ccc tec age 



TYICNVNHKPS 
acc tac ate tgc aac gtg aat cac aag ccc 



DKKVEPKSCAA 
gac aag aaa gtt gag ccc aaa tct tgt GCG 



NotI 

i 
I 

8910 
ate 

! ..Notl. 

Tag 

i 
j 

8955 

i 



AHHHHHHGAAEQKLI 
GCa cat cat cat cac cat cac ggg gee gca gaa caa aaa etc 

H6 tag Myc- 



S 
tea 
Myc- 



E E D 
gaa gag gat 
Tag 



LNGAAqASSA 
ctg aat ggg gee gca tag GCT AGC tct get 

... Nhel . . . 

Amber 



! Ill 'stump 



• 
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8997 
GCC 



i 

t !W.T. 

I 

Kasl. . . 
i 
i 

9045 
ggt 

I 

c !W.T. 
i 
i 

9093 
ttt 
t 

c !W.T 



SGDFDYEKMANANKGA 
agt ggc gac ttc gac tac gag aaa atg get aat gec aac aaa GGC 

tec tttttag acttqg 



.(2/4) 

MTENADENALQS DAKG 
atG ACT GAG AAC GCT GAC GAG aat get ttg caa age gat gee aag 

c a t c t a c gca g tct eta 



KLDSVATDYGAAI DG 
aag tta gac age gTC GCG Ace gac tat GGC GCC gee ATC GAc ggc 



act t tct t 
Nrul. . . 



c t t t 
Kasl. . . (3/4) 



IGDVSGLANGNGATG 
9141 ate ggc gat gtc agt ggt tTG GCC Aac ggc aac gga gee ace gga 



gac 
i 

t !W.T. 



t tec c c t t t t 
MscI (3/3) 



FAGSNSQMAQVG DGD 
918 9 ttc GCA GGT tcG AAT TCt cag atg gcC CAG GTT GGA GAT GGg gac 



aac 
t 

t !W.T. 



t t 
BspMI. 



(2/2) 
EcoRI . 



t e 
Xcml. 



SPLMNNFRQYLPSLP 
9237 agt ccg ctt atg aac aac ttt aga cag tac ctt ccg tct ctt ccg 



cag 
j 

a !W.T. 
i 



tea 



tta 



c c t 



tta 



SVECRPFVFSAGKPYE 
9285 agt gtc gag tgc cgt cca ttc gtt ttc tct gee ggc aag cct tac 



gag 
j 

a !W.T. 
i 



teg 



t age 



a 



9333 



9369 
ttc 
j 

t ! W.T 



FSIDCDKINLFR 
ttc aGC Ate gac TGC gat aag ate aat ctt ttC CGC 

t tct tttcaactact !W.T. 

BstAPI SacII... 

End Domain 3 

GVFA FLLYVAT FMYV 
GGc gtt ttc get ttc ttg eta tac gtc get act ttc atg tac gtt 

tctgtcttattcct ta 

start transmembrane segment 



STFANIL RNKES 
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9417 aGC ACT TTC GCC AAT ATT TTA Cgc aac aaa gaa age 

tct gttcacg ttgg tct !W.T. 

Intracellular anchor. 



9453 tag tga tct CCT AGG 

! Avrll.. 
I 

9468 aag ccc gec taa tga gcg ggc ttt ttt ttt ct ggt 
I Trp terminator | 

End Fab cassette 

9503 ATGCAT CCTGAGG ccgat actgtegteg tcccctcaaa ctggcagatg 
! Nsil . . Bsu36I . (3/3) 

9551 cacggttacg atgcgcccat ctacaccaac gtgacctatc ecattaeggt 
caatccgccg 

9611 tttgttccca eggagaatec gacgggttgt tactcgctca catttaatgt 
tgatgaaagc 

9671 tggctacagg aaggecagae gcgaattatt tttgatggcg ttcctattgg 
ttaaaaaatg 

9731 agctgattta acaaaaattt aaTgcgaatt ttaacaaaat attaacgttt 
acaATTTAAA 

Swal . . . 

9791 Tatttgctta tacaatcttc ctgtttttgg ggcttttctg attatcaacc 
GGGGTAcat 

9850 ATG att gac atg eta gtt tta cga tta ccg ttc ate gat tct ctt 
gtt tgc 

! Start gene II 

9901 tec aga etc tea ggc aat gac ctg ata gee ttt gtA GAT CTc tea 
aaa ata 

! Bglll... 

9952 get ace etc tec ggc atT aat tta tea get aga acg gtt gaa tat 
cat att 

10003 gat ggt gat ttg act gtc tec ggc ctt tct cac cct ttt gaa tct 
tta cct 

10054 aca cat tac tea ggc att gca ttt aaa ata tat gag ggt tct aaa 
aat ttt 

10105 tat cct tgc gtt gaa ata aag get tct ccc gca aaa gta tta cag 
ggt cat 

10156 aat gtt ttt ggt aca ace gat tta get tta tgc tct gag get tta 
ttg ctt 

10207 aat ttt get aat tct ttg cct tgc ctg tat gat tta ttg gat gtt ! 
! gene II continues 

i End of Tab i e _ 
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Table 37: DNA seq of w.t. M13 gene iii 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
fMKKLLFAI PLVVPFY 
1579 gtg aaa aaa tta tta ttc gca att cct tta gtt gtt cct ttc tat 
Signal sequence 

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
SHSAETVESCLAKPH 
1624 tct cac tec get gaa act gtt gaa agt tgt tta gca aaa ccc cat 
Signal sequence> Domain 1 

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
TENSFTNVWKDDKTL 
1669 aca gaa aat tea ttt act aac gtc tgg aaa gac gac aaa act tta 
Domain 1 

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
DRYANYEGCLWNATG 
1714 gat cgt tac get aac tat gag ggt tgt ctg tgG AAT GCt aca ggc 

BsmI .... 

Domain 1 

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
VVVCTGDETQCYGTW 
1759 gtt gta gtt tgt act ggt gac gaa act cag tgt tac ggt aca tgg 
Domain 1 

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
VPIGLAI PENEGGGS 
1804 gtt cct att ggg ctt get ate cct gaa aat gag ggt ggt ggc tct 
Domain 1 > Linker 1 

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
EGGGS EGGGSEGGGT 
1849 gag ggt ggc ggt tct gag ggt ggc ggt tct gag ggt ggc ggt act 
Linker 1 > 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
KPPEYGDTPI PGYTY 
1894 aaa cct cct gag tac ggt gat aca cct att ccg ggc tat act tat 
Domain 2 

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
INPLDGTYPPGTEQN 
1939 ate aac cct etc gac ggc act taT CCG CCt ggt act gag caa aac 

Ecil 

Domain 2 

136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 
PANPNPSLEESQPLN 
1984 ccc get aat cct aat cct tct ctt GAG GAG tct cag cct ctt aat 

BseRI . . 

Domain 2 

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 
TFMFQNNRFRNRQGA 
2029 act ttc atg ttt cag aat aat agg ttc cga aat agg cag ggg gca 
Domain 2 

166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 
LTVYTGTVTQGTDPV 
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2074 tta act gtt tat acg ggc act gtt act caa ggc act gac ccc gtt 
Domain 2 

181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 
KTYYQYTPVSS KAMY 
2119 aaa act tat tac cag tac act cct gta tea tea aaa gec atg tat 
Domain 2 

196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 
DAYWNGKFRDCAFHS 
2164 gac get tac tgg aac ggt aaa ttC AG a gaC TGc get ttc cat tct 

AlwNI 

Domain 2 

211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 
GFNEDPFVCEYQGQS 
2209 ggc ttt aat gaG GAT CCa ttc gtt tgt gaa tat caa ggc caa teg 
BamHI ... 

Domain 2 

226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 
SDLPQPPVNAGGGSG 
2254 tct gac ctg cct caa cct cct gtc aat get ggc ggc ggc tct ggt 
Domain 2 > Linker 2 

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 
GGSGGGSEGGGSEGG 
2299 ggt ggt tct ggt ggc ggc tct gag ggt ggt ggc tct gag ggt ggc 
Linker 2 

256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 
GSEGGGSEGGGSGGG 
2344 ggt tct gag ggt ggc ggc tct gag gga ggc ggt tec ggt ggt ggc 
Linker 2 

271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 
SGSGDFDYEKMANAN 
2389 tct ggt tec ggt gat ttt gat tat gaa aag atg gca aac get aat 
Linker 2> Domain 3 

286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 
KGAMTENADENALQS 
2434 aag ggg get atg ace gaa aat gee gat gaa aac gcg eta cag tct 
Domain 3 

301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 
DAKGKLDSVAT DYGA 
2479 gac get aaa ggc aaa ctt gat tct gtc get act gat tac ggt get 
Domain 3 

316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 
AIDGFIGDVSGLANG 
2524 get ate gat ggt ttc att ggt gac gtt tec ggc ctt get aat ggt 
Domain 3 

331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 
NGATGDFAGSN SQMA 
2569 aat ggt get act ggt gat ttt get ggc tct aat tec caa atg get 
Domain 3 
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346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 
QVGDGDNSPLMNN FR 
2614 caa gtc ggt gac ggt gat aat tea cct tta atg aat aat ttc cgt 
Domain 3 

361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 
QYLPSLPQSVECRPF 
2659 caa tat tta cct tec etc cct caa teg gtt gaa tgt cgc cct ttt 
Domain 3 

376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 
VFSAGKPYEFS I DCD 
2704 gtc ttt age get ggt aaa cca tat gaa ttt tct att gat tgt gac 
Domain 3 

391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 
KINLFRGVFAFLLYV 
2749 aaa ata aac tta ttc cgt ggt gtc ttt gcg ttt ctt tta tat gtt 
Domain 3 > Transmembrane segment 

406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 
ATFMYVFSTFANILR 
2794 gee ace ttt atg tat gta ttt tct acg ttt get aac ata ctg cgt 
Transmembrane segment > ICA- 

421 422 423 424 425 
N K E S 
2839 aat aag gag tct taa ! 2853 

ICA > ICA = intracellular anchor 
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Table 38: Whole mature III anchor M13-III 
derived anchor with receded DNA 

i 

! 12 3 

! AAA 

1 GCG gec gca 

! NotI 

! 

! 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

! HHHHHH GAAEQKLI 

10 cat cat cat cac cat cac ggg gec gca gaa caa aaa etc ate 

I 

! 18 19 20 21 22 23 24 25 26 27 28 29 

! SEEDLNGAA.AS 

52 tea gaa gag gat ctg aat ggg gec gca Tag GCT AGC 
! " Nhel... 

! 30 31 32 33 34 35 36 37 38 39 

! DINDDRM AST 

88 GAT ATC aac gat gat cqt atq get tct act 
! (ON_G37bot) [RC] 5 ' - c aac gat gat cqt atq gcG CAt Get gec gag aca 
g-3' 

! EcoRV.. 

! Enterokinase cleavage site. 

I 

! Start mature III (recoded) Domain 1 > 

! 40 41 42 43 

! A E T V 

118 |gcC|gaG|acA|gtC| 
! t a t t ! W.T. 

! 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 

! ESCLAKPHTENS FTN 

130 | gaa | TCC | tgC | CTG | GCC | AaG | ccT | caC I acT | gaG I aat I AGT | ttC I aCA | Aat | 
! agt tta a a c t a a tea t t c 

! W.T. 

! MscI 

i 

! 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 

! VWKDDKTLDRYANYE 

175 I gtg I TGG I aaG I gaT | gaT | aaG | acC I CtT | gAT j CGA| TaT | gcC | aaT | taC | gaA | 
! c accatta tctctg! 

W.T. 

! BspDI... 
i 

! 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 

! GCLWNATGVVVCTGD 

220 i ggC | tgC I TtA I tgg I aat | gcC | ACC I GGC I GtC | gtT | gtC | TGC | ACG I ggC I gaT | 
! ttcg ta tattttc! 

W.T. 

! SgrAI Bsgl .... 

! 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 

! ETQCYGTWVPIGLAI 

2 65 | gaG | acA | caA | tgC I taT | ggC | ACG I TGg | gtG | ccG | atA | gGC I TTA | GCC I atA I 

! atg.tcta tttgette! 
W.T. 

! Pmll BlpI 
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20 w.t. 



Domain 1 > Linker 1 > 

104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 
PENEGGGSEGGGSEG 
310 I ccG I gaG I aaC | gaA I ggC I ggC | ggT | AGC | gaA | ggC I ggT | ggC I AGC I gaA I ggC I 
! t a t g t t c tct g t c t tct g t 

W.T. 

Linker 1 > Domain 2 > 

119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 
GGSEGGGTKPPEYGD 
355 I ggT | GGA | TCC I gaA | ggA | ggT | ggA | acC | aaG | ccG | ccG | gaA | taT | ggC | gaC I 
cttgtcttattgctt 

BamHI . . (2/2) 

134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 
TPIPGYTYINPLDGT 
400 | acT I ccG I atAi CCT | GGT | taC | acC I taC I atT I aaT | ccG | TtA | gaT | ggA) acC | 
attgctttcctcccct 



W.T. 



SexAI . 



149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 
YPPGTEQNPANPNPS 
4 4 5 I taC | ccT | ccG | ggC I acC | gaA | caG I aaT | ccT | gcC | aaC | ccG | aaC I ccA I AGC | 
! TGtttgaccttttt tct 

W.T. 
i 

Hindlll . . . 

164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 
LEESQPLNT FMFQNN 
4 90 I TTA | gaA | gaA | AGC | caA | ccG I TtA | aaC I acC I ttT J atg | ttC | caA | aaC I aaC I 
! c t G G tct gtctttc tgtt 

W.T. 

Hindlll . 



535 



W.T. 



179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 
RFRNRQGALTVYTGT 
I CgT | ttT 1 AgG | aaC I CgT I caA | gGT | GCT | CtT | acC | gTG I TAC I AcT I ggA I acC | 
ag cca tag g g ata t t t g c t 



HgiAI . 



BsrGI. . 



580 



W.T. 



625 



W.T. 



670 



194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 
VTQGTDPVKTYYQYT 
I gtC | acC | caG | GGT I ACC | gaT | ccT | gtC I aaG I acC I taC I taT | caA | taT | acC I 
ttactcctattcgct 

Kpnl . . . 

209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 
PVSSKAMYDAYWNGK 
I ccG I gtC I TCG | AGt | aaG I gcT | atg I taC I gaT | gcC I taT | tgg I aaT i ggC [ aaG I 
t a a tea ac tctc eta 

Bsal .... 

Xhol. . . . 

224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 

FRDCAFHSGFNEDPF 
I ttT | CgT | gaT | tgT | gcC I ttT I caC | AGC I ggT | ttC I aaC I gaa I gac I CCt I ttT I 
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! CAaCctct tct c t t G T a c ! 

W.T. 

239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 
VCEYQGQSS DLPQPP 
715 I gtC I tgC I gaG | taC I caG I ggT | caG I AGT | AGC I gaT | Tt A | ccG I caG | ccA | CCG I 
! t t a t a c a teg tct c c g t . a t t ! 

W.T. 

! DrdI 

Agel 

Domain 2 > Linker 2 > 

254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 
VNAGGGSGGGSGGGS 

7 60 | GTT | AAC | gcG j ggT | ggT | ggT | AGC j ggC I ggA | ggC I AGC | ggC | ggT | ggT j AGC | 
c t t c c c tct t t t tct tec tct 

W.T. 

Agel 

Hpal . . . 
Hindi . 

Linker 2 > 

Domain 3 — > 

! 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 

! EGGGSEGGGSGGGSG 
805 I gaA | ggC | ggA | ggT | AGC I gaA | ggA | ggT 1 ggC I AGC | ggA j ggC | ggT | AGC | ggC I 
g t t c tct g t c t tct g t c tct t 

W.T. 

Domain 3 > 

284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 
SGDFDYEKMANAN KG 
850 | AGT | ggC | gac I ttc I gac I tac I gag | aaa | atg | get I aat I gee I aac I aaa | GGC I 
! tec tttttag acttgg! 

W.T. 
i 

KasI 

299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 
AMTENADENALQS DA 

8 95 | GCC I atg ! act I gag | aac | get | gac ! gaG | AAT | GCA | ctg I caa I agt | gat | gCC I 

! t catctacgag tct c t ! 

W.T. 

! KasI .... BsmI .... 

Styl. . . 

314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 
KGKLDSVAT DYGAAI 
94 0 I AAG | GGt | aag | tta I gac I age I gTC j GCc | Aca | gac | tat | ggT | GCt I gee | ate j 
! a c a c t t tct t t t c t ! 

W.T. 

Styl PflFI 

329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 
DGFIGDVSGLANGNG 
985 j gac I ggc I ttt I ate f ggc I gat | gtc I agt | ggt I ctg I get I aac I ggc | aac I gga | 
! t t c t t c t tec c c t t t t t ! 

W.T. 
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! 344 345 346 347 348 349 350 351 352 353 

! ATGDFAGSNS 
1030 i gcc | acc I gga I gac I ttc I GCA I GGT I tcG I AAT j TCt I 

ttttttct c! W.T. 

BstBI . . . 

EcoRI . . . 

BspMI . . 

354 355 356 357 358 359 360 361 362 363 
QMAQVGDGDN 
1060 cag atg gcC CAG GTT GGA GAT GGg gac aac 

a tactcttt! W.T. 
Xcml 

364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 
379 

! SPLMNNFRQYLPSLP 

1090 agt ccg ctt atg aac aac ttt aga cag tac ctt ccg tct ctt ccg 
cag 

! tea tta t t cct a tta t c c t 

a ! W.T. 

t 

! 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 

395 

! SVECRPFVFSAGKPY 

1138 agt gtc gag tgc cgt cca ttc gtt ttc tct gcc ggc aag cct tac 
gag 

! teg tatcttct age t t a a t 

a ! W.T. 

Domain 3 > 

396 397 398 399 400 401 402 403 404 405 406 407 
FSIDCDKINLFR 
1186 ttc aGC Ate gac TGC gat aag ate aat ctt ttC CGC 
t tct tttcaacta t 
BstAPI SacII... 

transmembrane segment > 

408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 

423 

! GVFAFLLYVATFMYV 

1222 GGc gtt ttc get ttc ttg eta tac gtc get act ttc atg tac gtt 
ttc 

! tctgtctt at t cct ta 

t ! W.T. 



424 


425 


426 


427 


428 


429 


430 


431 


432 


433 


434 


435 




S 


T 


F 


A 


N 


I 


L 


R 


N 


K 


E 


S 




1270 aGC 


ACT 


TTC 


GCC 


AAT 


ATT 


TTA 


Cgc 


aac 


aaa 


gaa 


age 




tct 


g 


t 


t 


c 


a 


c g 


t 


t 


g 


g 


tct 


W.T 



Intracellular anchor. 



1306 tag tga tct CCT AGG . 

Avrll . . 

1321 aag ccc gcc taa tga gcg ggc ttt ttt ttt ct ggt 
I Trp terminator | 

End Fab cassette 

End 0 f Table 
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Table 39: ONs to make deletions in III 

! ONs for use with Nhel 

i 

N 

(ON_G2 9bot) 5*-c gTT gAT ATc gcT Age cTA Tgc 

3' ! 22 

! this is the reverse complement of 5 ' -gca tag get age gat ate aac g 
3' 

! Nhel . . . scab 

(ON_G104top) 5 * -g I ata | ggc | tta | gcT I aGC I ccg I gag I aac I gaa I gg-3 ' 
! 30 

! Scab Nhel... 104 105 106 107 108 

(ON_G236top) 5 1 -c I ttt I cac | age I ggt I ttc | GCT [AGC | gac | cct | ttt | gtc I tgc-3 ' 
! 37 

! Nhel... 236 237 238 239 240 

(ON_G236tCS ) 5 1 -c | ttt I cac I age | ggt I ttc | GCT | AG.C I gac | cct | ttt | gtc I Agc- 
! Nhel. . . 236 237 238 239 240 

gag I tac | cag I ggt | c-3 ' 

! 50 



! ONs for use with SphI G CAT Gc 

(ON_X37bot) 5'-gAc TgT cTc ggc Age ATg cgc cAT Acg ATc ATc gTT 

g-3' ! 37 

! NDDRMAHA 

! (ON_X37bot) = [RC] 5 * -c aac gat gat cgt atg qc G CAt Ge t gee gag aca 

gtc-3' 

! SphI Scab 

(ON_X104top) 5 ' -g I gtG ccg | ata | ggc I ttG I CAT | GCa I ccg | gag | aac | gaa | gg-3 ' 
! 36 

! Scab SphI 104 105 106 107 108 

(ON_X236top) 5 1 -c I ttt I cac | age j ggt I ttG i CaT | gCa | gac I cct I ttt | gtc | tgc-3 ' 
! 37 

! SphI 236 237 238 239 240 

(ON_X236tCS) 5 ' -c | ttt | cac | age | ggt | ttG | CaT | gCa | gac | cct | ttt | gtc I Age- 
! Nhel. . . 236 237 238 239 240 

gag I tac I cag | ggt I c-3 ' 

! 50 
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Table 40: Phage titers and enrichments of selections with 
a DY3F31-based human Fab library 





Input (total cfu) 


Output (total cfu) 


Output/input 
ratio 


Rl-ox selected on 
phOx-BSA 


4,5 x 10 12 


3,4 x 10 5 


7,5 x 10- 8 


R2-Strep selected 
on Strep-beads 


9,2 x 10 12 


3x 10 8 


3,3 x 10- 5 



Table 41: Frequency of ELISA positives in 
DY3F31-based Fab libraries 





Anti-M13 HRP 


9E10/RAM- 
HRP 


Anti-CK/CL 
Gar-HRP 


R2-ox (with IPTG induction) 


18/44 


10/44 


10/44 


R2-ox (without IPTG) 


13/44 


ND 


ND 


R3-strep (with IPTG) 


39/44 


38/44 


36/44 


R3-strep (without IPTG) 


33/44 


ND 


ND 



